Re: [DISCUSS] The state of the project - 2017

Jacques Nadeau Thu, 09 Nov 2017 18:31:06 -0800

Michael,

I think the ownership thinking is a really good idea. Things like trait
behaviors, volcano, types, hep, decorrelation, parsing, sql-to-rel,
materialized views are all good chunks that could be owned by someone. (in
addition to avatica and each of the connectors).


On Wed, Nov 8, 2017 at 2:50 PM, Michael Mior <[email protected]> wrote:

> Interesting thoughts about the paper you pointed to Julian. I believe I
> read it some time ago, but I'll have to dust it off and think about it in
> the context of Calcite. All your other thoughts also sound like exciting
> directions for Calcite.
>
> I hope we can all find ways to take some of the burden off your shoulders.
> While I am happy to serve as PMC chair, I'm still working on familiarizing
> myself with the code base to the point where I can more quickly review PRs
> with some level of confidence. (For the time being, I'm also not actively
> using Calcite.) I wonder if others would be willing to step up to "own"
> parts of the code base (e.g. as Josh does in many ways with Avatica). I
> think if we could have the majority of components on JIRA assigned by
> default to someone other than you, that might be a start. Of course,
> practically speaking so much is contained within core, that this might have
> marginal impact. We could also consider (on JIRA only) creating some
> additional components to further partition things.
>
> I forgot when I was thinking about CI that you have your own build suite
> running for the project which is much appreciated :) But I'm sure we would
> both agree that it would be nice if this extra testing wasn't resting
> solely on you. I'll start a separate thread when I have time to start
> hacking on CI-related things to get some more input.
>
> --
> Michael Mior
> [email protected]
>
> 2017-11-08 16:34 GMT-05:00 Julian Hyde <[email protected]>:
>
> > Thanks for starting this discussion, Jesus. Here are some thoughts, in
> > no particular order.
> >
> > I too have noticed the increase in academic adoption. This is
> > excellent. Shall we add a section to the "Powered by" page [1] on
> > academic projects and papers?
> >
> > I worry a lot about audience (or audiences). Who is using Calcite? Are
> > we giving them what they need? Data engines (such as Drill, Hive and
> > Flink) are one category, and I think they are fairly well served.
> > Academics are another audience; some are succeeding, but I wonder
> > whether it would be easier for them if we had some relevant examples,
> > such as how to parse a query and optimize it using several different
> > cost models and combinations of rules. What other audiences are there?
> >
> > There is an audience who would like to use Calcite as a standalone
> > engine; and folks who would like to incorporate materialized views,
> > indexes and constraints into their engine but prefer to speak SQL
> > rather than Java APIs. Those groups are not well served today. I am
> > working on a server which has DDL support[2][3]; it would provide a
> > (simple) standalone engine, but also allow us to demo materialized
> > views, virtual columns, check constraints and foreign tables/schemas
> > via SQL so that people building engines can more easily grasp the
> > concepts.
> >
> > I read Trumer & Koch's paper "Multi-objective parametric query
> > optimization" [4] in CACM recently. It is a very exciting advance, and
> > too much to cover in this thread, but it got me thinking about how
> > Calcite could evolve to incorporate their ideas. I realize that giving
> > RelOptCost multiple fields was a mistake, unless we also add the
> > mechanics (piecewise-linear cost functions and polytopes) to handle
> > them. The vast majority of Calcite remains applicable, so this would
> > be evolutionary: Calcite's rules and algebra emerge intact in the new
> > order, and Calcite's metadata framework can model the new cost
> > functions. Extending Calcite could raise some interesting research
> > topics; is it possible to extend the parameter space (either the
> > number of parameters or the value range of those parameters) after
> > initiial planning?; can we use parameters to model whether
> > intermediate results are materialized (see [5]) or whether ephemeral
> > materialized views happen to be present in cache?; what new statistics
> > do we need to gather to power the new cost functions? There is enough
> > here to interest several researchers.
> >
> > As for features:
> > * I would like to get to full compliance with OpenGIS, because spatial
> > support is much more straightforward in Calcite's algebraic approach
> > than in engines which need to build a new data structure.
> > * I also would like to give users a choice of engines in Calcite:
> > Spark and perhaps something based on Arrow, in addition to the
> > existing Enumerable engine.
> > * I would like to continue to make the planner more modular, so that
> > people can supply a program (a collection of rules organized into
> > planning phases) and basically just say "go".
> > * And I plan to continue my work to make data systems learn and adapt,
> > creating and populating materialized views based on observed query
> > patterns and data statistics.
> >
> > Regarding governance. I think we are functioning well as a
> > meritocratic community. High-quality contributions arrive from people
> > who have never contributed before; this is happening more and more
> > frequently, which is really excellent. On the other hand, this
> > increases the load for reviewing (and pro-actively fixing)
> > contributions, and too much of that work still falls on my shoulders.
> > There are times when I get close to burn out, especially when people
> > explicitly direct questions and pull requests at me.
> >
> > I think Michael would be an excellent PMC chair. I am delighted that
> > he is prepared to do the job.
> >
> > Regarding CI. There is a bit more CI going on than meets the eye; I
> > run several tests nightly on my home server, and also on a Windows VM,
> > and speak up if things get broken. But I admit there has been bit-rot
> > in some of the adapters, and having a public CI for those adapters
> > would be useful, if we can do so without generating too much noise.
> >
> > Julian
> >
> > [1] https://calcite.apache.org/docs/powered_by.html
> >
> > [2] https://issues.apache.org/jira/browse/CALCITE-707
> >
> > [3] https://issues.apache.org/jira/browse/CALCITE-1991
> >
> > [4] https://cacm.acm.org/magazines/2017/10/221322-
> > multi-objective-parametric-query-optimization/abstract
> >
> > [5] https://issues.apache.org/jira/browse/CALCITE-481
> >
> > On Tue, Nov 7, 2017 at 9:19 AM, Josh Elser <[email protected]> wrote:
> > > On 11/6/17 12:00 PM, Jesus Camacho Rodriguez wrote:
> > >>
> > >> I am not involved in the Avatica effort, but it has been great to see
> > >> Avatica continue maturing, moving into its own repository and
> following
> > with
> > >> its own release cadence. Josh, Julian, if you want to add a few lines
> > about
> > >> the state of Avatica, that would be great.
> > >
> > >
> > > Would be happy to :)
> > >
> > > I've certainly been spending less time on core-functionality. Avatica
> has
> > > definitely passed the cusp for what most developers need. The majority
> of
> > > users would find Avatica to be fully-featured as a JDBC interface (but
> > there
> > > are some gaps that still exist).
> > >
> > > We've started to see the focus on non-JDBC drivers for Avatica which
> is a
> > > great sign. Our Francis has been making progress on trying to adopt the
> > > driver written in Go into the Apache codebase. There are a few other
> > drivers
> > > available as well. The presence of these drivers, and their ability to
> > > continue to function is good validation of the protocol/stability model
> > that
> > > we outlined/implemented in the past 1-2 years.
> > >
> > > Avatica is still fairly low-volume, with only a few people
> contributing.
> > I'd
> > > love to see more people take an interest (it's a great stepping stone
> > into
> > > Calcite too ;P).
> >
>

Re: [DISCUSS] The state of the project - 2017

Reply via email to