Re: [DISCUSS] The state of the project

Jesus Camacho Rodriguez Wed, 05 Oct 2016 01:41:39 -0700

It is indeed great to see the wide adoption of Calcite. For instance, in the
upcoming Big Data Europe I count at least 5 presentations that will talk in
more or less detail about Calcite. And an anecdote: after last release, I got
a question from a business analyst about Calcite - he was continuously
hearing about the project in different venues (Hadoop Summit, Flink Forward,
etc.), but none of the commercial distributions were specifically indicating
that they support it, so he got curious... what was this project?


As stated previously, I think that continuing to improve the support for
streaming and semi-structured data, both at the interfacing and optimization
level, is key to make Calcite a toolbox that can be used by as many data
processing systems as possible now and in the near future. The wide adoption
is also helping us to consolidate the code, extending the tests coverage with
every new release, which is great.

I think a valid point was raised wrt to documentation. IMO, it is not a
problem of content (although it could be improved), but rather about the
presentation of the information: better show 'what' you can find in the
website and 'where' you can find it. I took a look at the presentation by
Jordan and it looks great! It would be nice to integrate some of its content
within the website.
I think we had this discussion when the project was still incubating, but
maybe we could start it again in a different thread since the community has
grown a lot since then: how to reorganize the content so it is clear for
newcomers what the project is, what it does, its different components, and
ways to integrate it with your own system.
Maybe cleaning the design (I personally like what they did recently with the
Apache Kafka website[1]) would help too.

An additional topic to discuss. Since Calcite is mainly used as a toolbox,
most users and devs work primarily in other projects. Probably due to that,
in some occasions I detect a tendency to 'ask' for support for 'x' and 'y',
instead of trying to implement and contribute solutions to the project.
Further, even if features are implemented, it might be done in those projects
instead of Calcite itself, and maybe never contributed back to the core. Let
me be clear: I do not mean it is done in bad faith; in many cases it might be
due to time constraints, something that I have experienced myself first-hand.
However, I wanted to raise this question for the rest of the community,
especially the most experienced members: is this a risk for Calcite
development? How do other Apache projects that are mainly used as a library
instead of a stand-alone project, deal with this? Do they do anything special
to engage the community in contributing the code back to the core?

Wrt to the PMC chair, thanks for proposing my name Julian. I truly enjoy
contributing to Calcite, and if the rest of the community agrees, I would be
happy to step up and continue contributing from that new role for one year.
But you all need to bear in mind that I will never be able to fill Julian's
shoes and the great work that he has done as the current PMC chair!

--
Jesús

[1] http://kafka.apache.org/






On 10/5/16, 8:43 AM, "Jungtaek Lim" <kabh...@gmail.com> wrote:

>Thanks all for putting efforts on maintaining and improving amazing project.
>Storm SQL heavily relies on Calcite, and it really gives lots of benefits.
>
>Btw, I second CPC's opinion, since some areas like using Calcite as JDBC
>driver (and adapter) are well documented, but other areas which should
>understand core concepts (like integrating Calcite to other project) are
>not documented and up to individual's understanding of Calcite.
>
>Coincidentally, the slide Jordan shared today is a great slide to explain
>core concepts of Calcite. Since its content is fit to presentation, it
>would be great if we have doc. version of slide (I mean more texts and
>explanations) to website.
>
>Thanks again,
>Jungtaek Lim (HeartSaVioR)
>
>2016년 10월 5일 (수) 오후 3:27, CPC <acha...@gmail.com>님이 작성:
>
>I think calcite is an amazing project and let me thank you for all your
>efforts.To bring more users and new committers i think documentation is
>really important. As a user it took waste amount of my time to understand
>concepts and other things. Because without some core concepts it is hard to
>find where to look for. I think it will be good to have some docs regarding
>core concepts and their representations in calcite.
>
>Regards...
>Anil halil
>
>On Oct 5, 2016 02:57, "Josh Elser" <els...@apache.org> wrote:
>
>>
>>
>> Julian Hyde wrote:
>>
>>> Hi Calcite community members,
>>>
>>> In a few weeks (22nd October) it will be a year since Calcite graduated
>>> to a top-level Apache project[1]. I think it’s been a good year!
>>>
>>> When we graduated, we decided to have an annual “state of the project”
>>> discussion and to vote for a new PMC chair/VP[2]. So, I’m kicking off
>both
>>> of those discussions.
>>>
>>> First, a few of my thoughts.
>>>
>>> I am pleased with the general rate of progress of the project. I’m
>>> pleased to see an increasing number of contributions from new
>contributors,
>>> and some of those becoming committers and PMC members. A couple of
>>> highlights this year were adapters for Cassandra and Elasticsearch that
>can
>>> out of the blue. I’m also pleased that we have continued a regular
>release
>>> cadence. This makes it easier for projects to use Calcite, and knowing
>that
>>> pull requests will be promptly reviewed and included in a release gives
>>> people an incentive to contribute.
>>>
>>> Calcite is becoming an ever better optimizer for SQL queries. This is
>>> helped immeasurably by the fact that Hive, Phoenix, Drill, Qubole and
>>> others are using Calcite for this and are contributing back. (Thanks to
>>> those communities for their continued collaboration!)
>>>
>>
>> +1 this has been awesome to watch :)
>>
>> But I also believe that Calcite can be used for non-traditional databases.
>>> Some examples:
>>>
>>> 1. I am a fan of what Drill have done with schema-less query processing
>>> and document-oriented data, and would like to bring similar functionality
>>> into core Calcite.
>>>
>>
>> I remember I saw a presentation by someone on Drill a while back (very
>> much a "intro to drill" by someone not affiliated with Calcite either).
>The
>> way the content was presented it was so very clear the influences of
>> Calcite into their architecture. Very cool to see!
>>
>> 2. I also like the idea of Calcite being a “toolkit” from which one can
>>> build a database (relational or non-relational). Phoenix have been going
>>> through the process of converting their existing parser&  planner to use
>>> Calcite, and I have learned a lot. But a lot still needs to be done to
>make
>>> Calcite easier to use as a framework.
>>>
>>> 3. I have been building consensus that SQL is a great language for stream
>>> processing[3], and working with Apex, Flink, Samza, Storm to build the
>>> pieces to implement streaming SQL. I am very excited about the way
>>> streaming SQL is gaining acceptance. Are there any other emerging areas
>>> should Calcite be targeting?
>>>
>>> Avatica continues to grow and mature. The Avatica site now lists clients
>>> in 4 languages[4], and there is also an ODBC driver (not open source)[5].
>>> The “one repo, one community, two web sites, two releases” strategy seems
>>> to be working adequately. But where do we see the project going? Would it
>>> help if it had its own namespace (org.apache.avatica) or web site (
>>> http://avatica.apache.org<http://avatica.apache.org/>)? Might it be a
>>> top-level project someday?
>>>
>>
>> I think, in time, Avatica could easily grow into its own entity. I don't
>> think we're there yet.
>>
>> I will say that I think there's been a regular amount of confusion with
>> Avatica and Calcite sharing a repository but not following the same
>> versioning scheme. People seem to be a bit confused when I tell them that
>> the two projects are not "attached" to one another (they are separate
>Maven
>> projects).
>>
>> I think pulling Avatica into its own repo would be good encouragement to
>> being its own entity (as well as drawing the line between Calcite and
>> Avatica codebases), but I think this is low-priority (as there are few of
>> us doing Avatica work) and we need to do a better job at clearly stating
>> what Avatica is/does and its API.
>>
>> Regarding community. Are we doing enough to reach out and bring new
>>> members into the community? Some of us have given talks at conferences
>and
>>> meetups over the last 12 months. Could we improve our geographical reach?
>>> Are there other things we could do to make the project more welcoming to
>>> new contributors? Could we do more to reach out to women and other
>>> demographic groups underrepresented in our community?
>>>
>>> What else are we doing well in the project? What are areas where we need
>>> to do better?
>>>
>>
>> As you outlined, adoption across other projects has been great. What about
>> adoption by users? I know the last time I tried to hack some SQL system
>> together with Calcite (albeit, quite a while ago), I was left wondering
>> what is "public API" (what are the classes I should use versus what are
>> those that are "internal"). I think we still see a fair amount of requests
>> for "hand-holding" as well. I'm not sure how we make this better (or if
>> it's a best use of time -- the csv example goes very far already!). Just a
>> comment.
>>
>> Lastly, since I agreed to step down as VP after 12 months, let’s start
>>> talking about a replacement. Being PMC chair is a privilege and it has
>>> taught me a huge amount about how Apache works. I think that Jesús
>Camacho
>>> Rodríguez could do an excellent job, if he is willing. Which other
>>> candidates should we consider?
>>>
>>
>> +1 to Jesús if he's amenable to it. He's been a pleasure to work with and
>> I'd have no complaints. Also happy to entertain others who would like to
>> step up (without volunteering them myself ;P)
>>
>> Please take some time to share your thoughts about the state of the
>>> project.
>>>
>>> Julian
>>>
>>> (VP Apache Calcite)
>>>
>>> [1] http://calcite.apache.org/news/2015/10/22/calcite-graduates/<
>>> http://calcite.apache.org/news/2015/10/22/calcite-graduates/>
>>>
>>> [2] http://mail-archives.apache.org/mod_mbox/incubator-calcite-
>>> dev/201509.mbox/%3CCF8D6F96-706F-4502-B41D-0689E357209D%40apache.org%3E<
>>> http://mail-archives.apache.org/mod_mbox/incubator-calcite-dev/201509.
>>> mbox/%3ccf8d6f96-706f-4502-b41d-0689e3572...@apache.org%3E>
>>>
>>> [3] https://calcite.apache.org/community/#streaming-sql<https://
>>> calcite.apache.org/community/#streaming-sql>
>>>
>>> [4] http://calcite.apache.org/avatica/docs/#clients<http://calci
>>> te.apache.org/avatica/docs/#clients>
>>>
>>> [5] https://hortonworks.com/hadoop-tutorial/bi-apache-phoenix-odbc/<
>>> https://hortonworks.com/hadoop-tutorial/bi-apache-phoenix-odbc/>
>>>
>>>
>>>

Re: [DISCUSS] The state of the project

Reply via email to