Martijn: I'd interpret Julian's response as welcoming you to contribute to the Calcite :-)
Sounds like there is concretely room for: * reviewing ( ex: test, comment in PR, but not actually merge -- might make it easier/quicker for the current committers to then allow them to address other/more things ) * security bug fixes ( ex: if addressed, then committers are freed up for additional reviewing/other ) * the usual laundry list of open source projects welcoming helping hands [ seems you know the drill via Flink ]. Cheers, Austin On Tue, Jun 21, 2022 at 10:57 AM Julian Hyde <jhyde.apa...@gmail.com> wrote: > Please don’t fork Calcite. > > Calcite suffers from the tragedy of the commons. Unlike many open source > data projects, there is no commercial project that directly maps to Calcite > (even though Calcite is an essential part of many projects). As a result no > engineers work full-time on Calcite. > > It takes more than pull requests to keep a project going. We need > reviewers, people to work on releases, people to fix bugs (such as security > bugs) that are important to everyone but urgent to no one. > > We have plenty of committers in Calcite, and add several more per year. We > rely on those committers taking on their share of the housework, but the > burden falls on too few people. > > Engineering managers need to start paying a little more for the “free > lunch” that they enjoy when Calcite “just works” in their project. Sadly, > most engineering managers are not subscribed to this list. > > Julian > > > > On Jun 21, 2022, at 9:49 AM, Jacques Nadeau <jacq...@apache.org> wrote: > > > > Martijn, thanks for sharing that thread in the Flink community. > > > > I'm someone who has forked Calcite twice: once in Apache Drill and again > in > > Dremio. In both cases, it was all about trading short term benefits > against > > long term costs. In both cases, I think the net amount of work was > probably > > 5x as much as what it would have been if we had just done a better job > > engaging the community. If I were to state the curve of behavior over six > > years, I'd guess that in both cases the numbers of effort looked like > this: > > > > estimated effort doing high intensity integration with calcite (years > 1-6) > > fork: 1, 5, 10, 50, 100, 200, total = 366 > > non-fork: 10, 10, 10, 10, 10, total = 50 > > > > So yes, the first couple years you're ahead. But you pay a massive > > technical debt premium long term. Early in a project (Drill) or company's > > life (Dremio), it can make sense to sacrifice long term for short term > but > > it's important people do it with their eyes open. > > > > The reason that this pain is so high is that as your codebases diverge, > you > > start having to do everything the Calcite community does by yourself. > > Backports become harder and things that you need (e.g. new sql syntax, > etc) > > have to be reimplemented (even if someone else already implemented them > in > > some post-fork Calcite version. Ultimately, at some point you realize > that > > your path is untenable and you unfork. This becomes the biggest expense > of > > them all and I believe both of those teams are still trying to un-fork. > The > > additional thing that becomes an even bigger problem is your absence from > > the Calcite community means that people may take the project or APIs in > > ways that are in direct conflict to how you use the library. Since you're > > not active in the project, you fail to provide a counterpoint and then > > you're basically just in a miserable place. The Hive project did this > best > > by ensuring that releases of Calcite were also run pre-release against > Hive > > to make sure no major regressions occurred. By being in the community and > > active, this is the best state from my pov. (It makes your project better > > and Calcite better.) > > > > Two last notes: > > - I'm not sure the rocks fork is comparable to forking Calcite. The api > > surface area and community models are very different. > > - This is all based on a high intensity integration (using rules + > planner > > or sql + rules + planner). Calcite is frustratingly monolithic and if > > someone was only going to use a small component, my opinion would likely > be > > very different. > > > > I'd send this to the Flink list but I'm not subscribed. It'd be great if > > you shared it with the people over there if you think they'd find it > useful. > > > > > > > > On Tue, Jun 21, 2022 at 12:31 AM Martijn Visser < > martijnvis...@apache.org> > > wrote: > > > >> Thanks Julian and Austin! > >> > >> Any reply to kick-off some sort of discussion is worthwhile :D > >> I definitely know the feeling of having more PRs open then you would > like, > >> looking at https://github.com/apache/flink/pulls :) > >> > >> There have been discussions in the Flink community about forking Calcite > >> [1]. My personal preference at the moment is to see if we can create a > >> better collaboration and community. I believe that we can find people > from > >> the Flink community who can open / help reviewing Calcite PRs that are > >> interesting for the Flink community. The question is if that will also > help > >> short term since in the end it still requires a Calcite maintainer to > >> review/merge. > >> > >> Best regards, > >> > >> Martijn > >> > >> [1] https://lists.apache.org/thread/1oqydpsm4mc55bkk440gx9lr9gf2rvf4 > >> > >> > >> Op ma 20 jun. 2022 om 23:51 schreef Austin Bennett < > >> whatwouldausti...@gmail.com>: > >> > >>> From the peanut gallery :-) --> > >>> > >>> Wow; yes, lots of open PRs. https://github.com/apache/calcite/pulls > >>> > >>> How can individuals from the Flink [sub-]community, and/or more general > >>> calcite community help lighten this load? Is there much weight given > to > >>> reviews from non-committers; how to increase the # of people capable of > >>> providing worthwhile reviews [ that are recognized as such ]? > >>> > >>> > >>> > >>> On Mon, Jun 20, 2022 at 11:47 AM Julian Hyde <jhyde.apa...@gmail.com> > >>> wrote: > >>> > >>>> Martijn, > >>>> > >>>> Since you requested a reply, I am replying. To answer your question, I > >>>> don’t know of a way to move this topic forward. We have more PRs than > >>>> people to review them. > >>>> > >>>> Julian > >>>> > >>>> > >>>>> On Jun 19, 2022, at 11:58 PM, Martijn Visser < > >> martijnvis...@apache.org > >>>> > >>>> wrote: > >>>>> > >>>>> Hi everyone, > >>>>> > >>>>> I just wanted to reach out to the Calcite community once more on this > >>>> topic > >>>>> since no reply was received. Would be great if someone could get back > >>> to > >>>> us. > >>>>> > >>>>> Best regards, > >>>>> > >>>>> Martijn > >>>>> > >>>>> Op wo 8 jun. 2022 om 11:24 schreef Martijn Visser < > >>>> martijnvis...@apache.org > >>>>>> : > >>>>> > >>>>>> Hi everyone, > >>>>>> > >>>>>> I would like to follow-up on this email that was sent by Jing. So > >> far, > >>>> no > >>>>>> progress has been made, despite reaching out to the mailing list, > >> the > >>>>>> original Jira ticket and reaching out to people directly. Is there a > >>> way > >>>>>> that we can move this PR/topic forward? > >>>>>> > >>>>>> For context, in Apache Flink we're currently heavily using Calcite. > >>>>>> However, we are now at the stage where Calcite is actually holding > >> us > >>>> back. > >>>>>> It would be great if we can find a way to strengthen our bond and > >> move > >>>> both > >>>>>> Calcite and Flink forward. > >>>>>> > >>>>>> Looking forward to your thoughts, > >>>>>> > >>>>>> Martijn > >>>>>> > >>>>>> On 2022/01/26 07:05:37 Jing Zhang wrote: > >>>>>>> Hi community, > >>>>>>> My apologies for interrupting. > >>>>>>> Anyone could help to review the pr > >>>>>>> https://github.com/apache/calcite/pull/2606? > >>>>>>> Thanks a lot. > >>>>>>> > >>>>>>> CALCITE-4865 is the first sub-task of CALCITE-4864. This Jira aims > >> to > >>>>>>> extend existing Table function in order to support Polymorphic > >> Table > >>>>>>> Function which is introduced as the part of ANSI SQL 2016. > >>>>>>> > >>>>>>> The brief change logs of the PR are: > >>>>>>> - Update `Parser.jj` to support partition by clause and order by > >>>> clause > >>>>>>> for input table with set semantics of PTF > >>>>>>> - Introduce `TableCharacteristics` which contains three > >>>> characteristics > >>>>>>> of input table of table function > >>>>>>> - Update `SqlTableFunction` to add a method > >> `tableCharacteristics`, > >>>>>> the > >>>>>>> method returns the table characteristics for the ordinal-th > >> argument > >>> to > >>>>>>> this table function. Default return value is Optional.empty which > >>> means > >>>>>> the > >>>>>>> ordinal-th argument is not table. > >>>>>>> - Introduce `SqlSetSemanticsTable` which represents input table > >> with > >>>>>> set > >>>>>>> semantics of Table Function, its `SqlKind` is `SET_SEMANTICS_TABLE` > >>>>>>> - Updates `SqlValidatorImpl` to validate only set semantic table > >> of > >>>>>> Table > >>>>>>> Function could have partition by and order by clause > >>>>>>> - Update `SqlToRelConverter#substituteSubQuery` to parse subQuery > >>>> which > >>>>>>> represents set semantics table. > >>>>>>> > >>>>>>> PR: https://github.com/apache/calcite/pull/2606 > >>>>>>> JIRA: https://issues.apache.org/jira/browse/CALCITE-4865 > >>>>>>> Parent JARA: https://issues.apache.org/jira/browse/CALCITE-4864 > >>>>>>> > >>>>>>> Best, > >>>>>>> Jing Zhang > >>>>>>> > >>>>>> > >>>> > >>>> > >>> > >> > >