Martijn:

I'd interpret Julian's response as welcoming you to contribute to the
Calcite :-)

Sounds like there is concretely room for:
* reviewing ( ex: test, comment in PR, but not actually merge -- might make
it easier/quicker for the current committers to then allow them to address
other/more things )
* security bug fixes ( ex: if addressed, then committers are freed up for
additional reviewing/other )
* the usual laundry list of open source projects welcoming helping hands [
seems you know the drill via Flink ].

Cheers,
Austin


On Tue, Jun 21, 2022 at 10:57 AM Julian Hyde <jhyde.apa...@gmail.com> wrote:

> Please don’t fork Calcite.
>
> Calcite suffers from the tragedy of the commons. Unlike many open source
> data projects, there is no commercial project that directly maps to Calcite
> (even though Calcite is an essential part of many projects). As a result no
> engineers work full-time on Calcite.
>
> It takes more than pull requests to keep a project going. We need
> reviewers, people to work on releases, people to fix bugs (such as security
> bugs) that are important to everyone but urgent to no one.
>
> We have plenty of committers in Calcite, and add several more per year. We
> rely on those committers taking on their share of the housework, but the
> burden falls on too few people.
>
> Engineering managers need to start paying a little more for the “free
> lunch” that they enjoy when Calcite “just works” in their project. Sadly,
> most engineering managers are not subscribed to this list.
>
> Julian
>
>
> > On Jun 21, 2022, at 9:49 AM, Jacques Nadeau <jacq...@apache.org> wrote:
> >
> > Martijn, thanks for sharing that thread in the Flink community.
> >
> > I'm someone who has forked Calcite twice: once in Apache Drill and again
> in
> > Dremio. In both cases, it was all about trading short term benefits
> against
> > long term costs. In both cases, I think the net amount of work was
> probably
> > 5x as much as what it would have been if we had just done a better job
> > engaging the community. If I were to state the curve of behavior over six
> > years, I'd guess that in both cases the numbers of effort looked like
> this:
> >
> > estimated effort doing high intensity integration with calcite (years
> 1-6)
> > fork: 1, 5, 10, 50, 100, 200, total = 366
> > non-fork: 10, 10, 10, 10, 10, total = 50
> >
> > So yes, the first couple years you're ahead. But you pay a massive
> > technical debt premium long term. Early in a project (Drill) or company's
> > life (Dremio), it can make sense to sacrifice long term for short term
> but
> > it's important people do it with their eyes open.
> >
> > The reason that this pain is so high is that as your codebases diverge,
> you
> > start having to do everything the Calcite community does by yourself.
> > Backports become harder and things that you need (e.g. new sql syntax,
> etc)
> > have to be reimplemented (even if someone else already implemented them
> in
> > some post-fork Calcite version. Ultimately, at some point you realize
> that
> > your path is untenable and you unfork. This becomes the biggest expense
> of
> > them all and I believe both of those teams are still trying to un-fork.
> The
> > additional thing that becomes an even bigger problem is your absence from
> > the Calcite community means that people may take the project or APIs in
> > ways that are in direct conflict to how you use the library. Since you're
> > not active in the project, you fail to provide a counterpoint and then
> > you're basically just in a miserable place. The Hive project did this
> best
> > by ensuring that releases of Calcite were also run pre-release against
> Hive
> > to make sure no major regressions occurred. By being in the community and
> > active, this is the best state from my pov. (It makes your project better
> > and Calcite better.)
> >
> > Two last notes:
> > - I'm not sure the rocks fork is comparable to forking Calcite. The api
> > surface area and community models are very different.
> > - This is all based on a high intensity integration (using rules +
> planner
> > or sql + rules + planner). Calcite is frustratingly monolithic and if
> > someone was only going to use a small component, my opinion would likely
> be
> > very different.
> >
> > I'd send this to the Flink list but I'm not subscribed. It'd be great if
> > you shared it with the people over there if you think they'd find it
> useful.
> >
> >
> >
> > On Tue, Jun 21, 2022 at 12:31 AM Martijn Visser <
> martijnvis...@apache.org>
> > wrote:
> >
> >> Thanks Julian and Austin!
> >>
> >> Any reply to kick-off some sort of discussion is worthwhile :D
> >> I definitely know the feeling of having more PRs open then you would
> like,
> >> looking at https://github.com/apache/flink/pulls :)
> >>
> >> There have been discussions in the Flink community about forking Calcite
> >> [1]. My personal preference at the moment is to see if we can create a
> >> better collaboration and community. I believe that we can find people
> from
> >> the Flink community who can open / help reviewing Calcite PRs that are
> >> interesting for the Flink community. The question is if that will also
> help
> >> short term since in the end it still requires a Calcite maintainer to
> >> review/merge.
> >>
> >> Best regards,
> >>
> >> Martijn
> >>
> >> [1] https://lists.apache.org/thread/1oqydpsm4mc55bkk440gx9lr9gf2rvf4
> >>
> >>
> >> Op ma 20 jun. 2022 om 23:51 schreef Austin Bennett <
> >> whatwouldausti...@gmail.com>:
> >>
> >>> From the peanut gallery :-)  -->
> >>>
> >>> Wow; yes, lots of open PRs.  https://github.com/apache/calcite/pulls
> >>>
> >>> How can individuals from the Flink [sub-]community, and/or more general
> >>> calcite community help lighten this load?  Is there much weight given
> to
> >>> reviews from non-committers; how to increase the # of people capable of
> >>> providing worthwhile reviews [ that are recognized as such ]?
> >>>
> >>>
> >>>
> >>> On Mon, Jun 20, 2022 at 11:47 AM Julian Hyde <jhyde.apa...@gmail.com>
> >>> wrote:
> >>>
> >>>> Martijn,
> >>>>
> >>>> Since you requested a reply, I am replying. To answer your question, I
> >>>> don’t know of a way to move this topic forward. We have more PRs than
> >>>> people to review them.
> >>>>
> >>>> Julian
> >>>>
> >>>>
> >>>>> On Jun 19, 2022, at 11:58 PM, Martijn Visser <
> >> martijnvis...@apache.org
> >>>>
> >>>> wrote:
> >>>>>
> >>>>> Hi everyone,
> >>>>>
> >>>>> I just wanted to reach out to the Calcite community once more on this
> >>>> topic
> >>>>> since no reply was received. Would be great if someone could get back
> >>> to
> >>>> us.
> >>>>>
> >>>>> Best regards,
> >>>>>
> >>>>> Martijn
> >>>>>
> >>>>> Op wo 8 jun. 2022 om 11:24 schreef Martijn Visser <
> >>>> martijnvis...@apache.org
> >>>>>> :
> >>>>>
> >>>>>> Hi everyone,
> >>>>>>
> >>>>>> I would like to follow-up on this email that was sent by Jing. So
> >> far,
> >>>> no
> >>>>>> progress has been made, despite reaching out to the mailing list,
> >> the
> >>>>>> original Jira ticket and reaching out to people directly. Is there a
> >>> way
> >>>>>> that we can move this PR/topic forward?
> >>>>>>
> >>>>>> For context, in Apache Flink we're currently heavily using Calcite.
> >>>>>> However, we are now at the stage where Calcite is actually holding
> >> us
> >>>> back.
> >>>>>> It would be great if we can find a way to strengthen our bond and
> >> move
> >>>> both
> >>>>>> Calcite and Flink forward.
> >>>>>>
> >>>>>> Looking forward to your thoughts,
> >>>>>>
> >>>>>> Martijn
> >>>>>>
> >>>>>> On 2022/01/26 07:05:37 Jing Zhang wrote:
> >>>>>>> Hi community,
> >>>>>>> My apologies for interrupting.
> >>>>>>> Anyone could help to review the pr
> >>>>>>> https://github.com/apache/calcite/pull/2606?
> >>>>>>> Thanks a lot.
> >>>>>>>
> >>>>>>> CALCITE-4865 is the first sub-task of CALCITE-4864. This Jira aims
> >> to
> >>>>>>> extend existing Table function in order to support Polymorphic
> >> Table
> >>>>>>> Function which is introduced as the part of ANSI SQL 2016.
> >>>>>>>
> >>>>>>> The brief change logs of the PR are:
> >>>>>>> - Update `Parser.jj` to support partition by clause and order by
> >>>> clause
> >>>>>>> for input table with set semantics of PTF
> >>>>>>> - Introduce `TableCharacteristics` which contains three
> >>>> characteristics
> >>>>>>> of input table of table function
> >>>>>>> - Update `SqlTableFunction` to add a method
> >> `tableCharacteristics`,
> >>>>>> the
> >>>>>>> method returns the table characteristics for the ordinal-th
> >> argument
> >>> to
> >>>>>>> this table function. Default return value is Optional.empty which
> >>> means
> >>>>>> the
> >>>>>>> ordinal-th argument is not table.
> >>>>>>> - Introduce `SqlSetSemanticsTable` which represents input table
> >> with
> >>>>>> set
> >>>>>>> semantics of Table Function, its `SqlKind` is `SET_SEMANTICS_TABLE`
> >>>>>>> - Updates `SqlValidatorImpl` to validate only set semantic table
> >> of
> >>>>>> Table
> >>>>>>> Function could have partition by and order by clause
> >>>>>>> - Update `SqlToRelConverter#substituteSubQuery` to parse subQuery
> >>>> which
> >>>>>>> represents set semantics table.
> >>>>>>>
> >>>>>>> PR: https://github.com/apache/calcite/pull/2606
> >>>>>>> JIRA: https://issues.apache.org/jira/browse/CALCITE-4865
> >>>>>>> Parent JARA: https://issues.apache.org/jira/browse/CALCITE-4864
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Jing Zhang
> >>>>>>>
> >>>>>>
> >>>>
> >>>>
> >>>
> >>
>
>

Reply via email to