Michael, I think the ownership thinking is a really good idea. Things like trait behaviors, volcano, types, hep, decorrelation, parsing, sql-to-rel, materialized views are all good chunks that could be owned by someone. (in addition to avatica and each of the connectors).
On Wed, Nov 8, 2017 at 2:50 PM, Michael Mior <[email protected]> wrote: > Interesting thoughts about the paper you pointed to Julian. I believe I > read it some time ago, but I'll have to dust it off and think about it in > the context of Calcite. All your other thoughts also sound like exciting > directions for Calcite. > > I hope we can all find ways to take some of the burden off your shoulders. > While I am happy to serve as PMC chair, I'm still working on familiarizing > myself with the code base to the point where I can more quickly review PRs > with some level of confidence. (For the time being, I'm also not actively > using Calcite.) I wonder if others would be willing to step up to "own" > parts of the code base (e.g. as Josh does in many ways with Avatica). I > think if we could have the majority of components on JIRA assigned by > default to someone other than you, that might be a start. Of course, > practically speaking so much is contained within core, that this might have > marginal impact. We could also consider (on JIRA only) creating some > additional components to further partition things. > > I forgot when I was thinking about CI that you have your own build suite > running for the project which is much appreciated :) But I'm sure we would > both agree that it would be nice if this extra testing wasn't resting > solely on you. I'll start a separate thread when I have time to start > hacking on CI-related things to get some more input. > > -- > Michael Mior > [email protected] > > 2017-11-08 16:34 GMT-05:00 Julian Hyde <[email protected]>: > > > Thanks for starting this discussion, Jesus. Here are some thoughts, in > > no particular order. > > > > I too have noticed the increase in academic adoption. This is > > excellent. Shall we add a section to the "Powered by" page [1] on > > academic projects and papers? > > > > I worry a lot about audience (or audiences). Who is using Calcite? Are > > we giving them what they need? Data engines (such as Drill, Hive and > > Flink) are one category, and I think they are fairly well served. > > Academics are another audience; some are succeeding, but I wonder > > whether it would be easier for them if we had some relevant examples, > > such as how to parse a query and optimize it using several different > > cost models and combinations of rules. What other audiences are there? > > > > There is an audience who would like to use Calcite as a standalone > > engine; and folks who would like to incorporate materialized views, > > indexes and constraints into their engine but prefer to speak SQL > > rather than Java APIs. Those groups are not well served today. I am > > working on a server which has DDL support[2][3]; it would provide a > > (simple) standalone engine, but also allow us to demo materialized > > views, virtual columns, check constraints and foreign tables/schemas > > via SQL so that people building engines can more easily grasp the > > concepts. > > > > I read Trumer & Koch's paper "Multi-objective parametric query > > optimization" [4] in CACM recently. It is a very exciting advance, and > > too much to cover in this thread, but it got me thinking about how > > Calcite could evolve to incorporate their ideas. I realize that giving > > RelOptCost multiple fields was a mistake, unless we also add the > > mechanics (piecewise-linear cost functions and polytopes) to handle > > them. The vast majority of Calcite remains applicable, so this would > > be evolutionary: Calcite's rules and algebra emerge intact in the new > > order, and Calcite's metadata framework can model the new cost > > functions. Extending Calcite could raise some interesting research > > topics; is it possible to extend the parameter space (either the > > number of parameters or the value range of those parameters) after > > initiial planning?; can we use parameters to model whether > > intermediate results are materialized (see [5]) or whether ephemeral > > materialized views happen to be present in cache?; what new statistics > > do we need to gather to power the new cost functions? There is enough > > here to interest several researchers. > > > > As for features: > > * I would like to get to full compliance with OpenGIS, because spatial > > support is much more straightforward in Calcite's algebraic approach > > than in engines which need to build a new data structure. > > * I also would like to give users a choice of engines in Calcite: > > Spark and perhaps something based on Arrow, in addition to the > > existing Enumerable engine. > > * I would like to continue to make the planner more modular, so that > > people can supply a program (a collection of rules organized into > > planning phases) and basically just say "go". > > * And I plan to continue my work to make data systems learn and adapt, > > creating and populating materialized views based on observed query > > patterns and data statistics. > > > > Regarding governance. I think we are functioning well as a > > meritocratic community. High-quality contributions arrive from people > > who have never contributed before; this is happening more and more > > frequently, which is really excellent. On the other hand, this > > increases the load for reviewing (and pro-actively fixing) > > contributions, and too much of that work still falls on my shoulders. > > There are times when I get close to burn out, especially when people > > explicitly direct questions and pull requests at me. > > > > I think Michael would be an excellent PMC chair. I am delighted that > > he is prepared to do the job. > > > > Regarding CI. There is a bit more CI going on than meets the eye; I > > run several tests nightly on my home server, and also on a Windows VM, > > and speak up if things get broken. But I admit there has been bit-rot > > in some of the adapters, and having a public CI for those adapters > > would be useful, if we can do so without generating too much noise. > > > > Julian > > > > [1] https://calcite.apache.org/docs/powered_by.html > > > > [2] https://issues.apache.org/jira/browse/CALCITE-707 > > > > [3] https://issues.apache.org/jira/browse/CALCITE-1991 > > > > [4] https://cacm.acm.org/magazines/2017/10/221322- > > multi-objective-parametric-query-optimization/abstract > > > > [5] https://issues.apache.org/jira/browse/CALCITE-481 > > > > On Tue, Nov 7, 2017 at 9:19 AM, Josh Elser <[email protected]> wrote: > > > On 11/6/17 12:00 PM, Jesus Camacho Rodriguez wrote: > > >> > > >> I am not involved in the Avatica effort, but it has been great to see > > >> Avatica continue maturing, moving into its own repository and > following > > with > > >> its own release cadence. Josh, Julian, if you want to add a few lines > > about > > >> the state of Avatica, that would be great. > > > > > > > > > Would be happy to :) > > > > > > I've certainly been spending less time on core-functionality. Avatica > has > > > definitely passed the cusp for what most developers need. The majority > of > > > users would find Avatica to be fully-featured as a JDBC interface (but > > there > > > are some gaps that still exist). > > > > > > We've started to see the focus on non-JDBC drivers for Avatica which > is a > > > great sign. Our Francis has been making progress on trying to adopt the > > > driver written in Go into the Apache codebase. There are a few other > > drivers > > > available as well. The presence of these drivers, and their ability to > > > continue to function is good validation of the protocol/stability model > > that > > > we outlined/implemented in the past 1-2 years. > > > > > > Avatica is still fairly low-volume, with only a few people > contributing. > > I'd > > > love to see more people take an interest (it's a great stepping stone > > into > > > Calcite too ;P). > > >
