Hi everyone, This is an awesome discussion to improve collaborating between different projects. Thanks Julian, Jacques, Austin, Martijn, Timo's effort to make it happen.
Best, Jing Zhang Martijn Visser <martijnvis...@apache.org> 于2022年6月23日周四 01:43写道: > Hi Jacques, Julian, Austin and everyone else, > > Thank you very much for sharing all your experiences and providing really > valuable input. I'll definitely relay this back to the original discussion > thread in the Flink community. Part of bringing this information back to > the Flink community is also because I feel like the only way that different > OSS solutions can help each other forward is by communicating and > collaborating. As Timo already mentioned, he'll try to help out. Let's try > to get some more involved. > > Side note: I also saw that this thread got some traction on Twitter [1] on > the cost of forking. > > Best regards, > > Martijn > > [1] > > https://twitter.com/gunnarmorling/status/1539499415337111553?s=21&t=8fGk3PxScOx4FJPJWE5UeA > > Op wo 22 jun. 2022 om 09:29 schreef Timo Walther <twal...@apache.org>: > > > Hi everyone, > > > > This is a really great discussion. Thanks for starting it Martijn and > > your input Jacques! I have been fighting against forking Calcite in > > Flink for years already. Even when merging forks of Flink that > > transitively forked Calcite, in the end we were able to resolve > > conflicts / contribute blockers back into Calcite. And I strongly > > believe that this is the better approach for long-term success for both > > projects. > > > > I would like to get more involved in the Calcite community. I have been > > implementing and managing Flink SQL based on Calcite since 2016. Thus, I > > feel confident to say that I know the code base and some quirks in the > > stack very well. > > > > Capacity-wise I will try to reserve some time for helping the Calcite > > community. Happy to get some pointers where and how I can help. > > > > I will take a look at https://github.com/apache/calcite/pull/2606 this > > week to get the ball rolling. As this is an important addition and > > prepares for "customer SQL operators" in Flink SQL. > > > > Regards, > > Timo > > > > On 21.06.22 22:18, Charles Givre wrote: > > > As the PMC for Apache Drill, I'd echo everyone's comments here.... > Don't > > fork. Don't do it. > > > > > > Apache Drill forked Calcite several years ago which Calcite was on > > version 1.20 or 1.21. While this meant that some bugs were easily fixed, > > what it also meant that as our fork diverged from "regular" Calcite, it > > became harder and harder to maintain. It also meant that we were chasing > > bugs that had since been fixed. > > > > > > Drill is in the process of "de-forking" Calcite, meaning that we're > > ditching our fork and re-integrating with standard Calcite. It has been > A > > TON of work and we have contributed (and will continue to contribute) bug > > fixes and PRs to Calcite. In the long run, I think this will be > beneficial > > for both communities. > > > > > > Best, > > > -- C > > > > > > > > >> On Jun 21, 2022, at 1:57 PM, Julian Hyde <jhyde.apa...@gmail.com> > > wrote: > > >> > > >> Please don’t fork Calcite. > > >> > > >> Calcite suffers from the tragedy of the commons. Unlike many open > > source data projects, there is no commercial project that directly maps > to > > Calcite (even though Calcite is an essential part of many projects). As a > > result no engineers work full-time on Calcite. > > >> > > >> It takes more than pull requests to keep a project going. We need > > reviewers, people to work on releases, people to fix bugs (such as > security > > bugs) that are important to everyone but urgent to no one. > > >> > > >> We have plenty of committers in Calcite, and add several more per > year. > > We rely on those committers taking on their share of the housework, but > the > > burden falls on too few people. > > >> > > >> Engineering managers need to start paying a little more for the “free > > lunch” that they enjoy when Calcite “just works” in their project. Sadly, > > most engineering managers are not subscribed to this list. > > >> > > >> Julian > > >> > > >> > > >>> On Jun 21, 2022, at 9:49 AM, Jacques Nadeau <jacq...@apache.org> > > wrote: > > >>> > > >>> Martijn, thanks for sharing that thread in the Flink community. > > >>> > > >>> I'm someone who has forked Calcite twice: once in Apache Drill and > > again in > > >>> Dremio. In both cases, it was all about trading short term benefits > > against > > >>> long term costs. In both cases, I think the net amount of work was > > probably > > >>> 5x as much as what it would have been if we had just done a better > job > > >>> engaging the community. If I were to state the curve of behavior over > > six > > >>> years, I'd guess that in both cases the numbers of effort looked like > > this: > > >>> > > >>> estimated effort doing high intensity integration with calcite (years > > 1-6) > > >>> fork: 1, 5, 10, 50, 100, 200, total = 366 > > >>> non-fork: 10, 10, 10, 10, 10, total = 50 > > >>> > > >>> So yes, the first couple years you're ahead. But you pay a massive > > >>> technical debt premium long term. Early in a project (Drill) or > > company's > > >>> life (Dremio), it can make sense to sacrifice long term for short > term > > but > > >>> it's important people do it with their eyes open. > > >>> > > >>> The reason that this pain is so high is that as your codebases > > diverge, you > > >>> start having to do everything the Calcite community does by yourself. > > >>> Backports become harder and things that you need (e.g. new sql > syntax, > > etc) > > >>> have to be reimplemented (even if someone else already implemented > > them in > > >>> some post-fork Calcite version. Ultimately, at some point you realize > > that > > >>> your path is untenable and you unfork. This becomes the biggest > > expense of > > >>> them all and I believe both of those teams are still trying to > > un-fork. The > > >>> additional thing that becomes an even bigger problem is your absence > > from > > >>> the Calcite community means that people may take the project or APIs > in > > >>> ways that are in direct conflict to how you use the library. Since > > you're > > >>> not active in the project, you fail to provide a counterpoint and > then > > >>> you're basically just in a miserable place. The Hive project did this > > best > > >>> by ensuring that releases of Calcite were also run pre-release > against > > Hive > > >>> to make sure no major regressions occurred. By being in the community > > and > > >>> active, this is the best state from my pov. (It makes your project > > better > > >>> and Calcite better.) > > >>> > > >>> Two last notes: > > >>> - I'm not sure the rocks fork is comparable to forking Calcite. The > api > > >>> surface area and community models are very different. > > >>> - This is all based on a high intensity integration (using rules + > > planner > > >>> or sql + rules + planner). Calcite is frustratingly monolithic and if > > >>> someone was only going to use a small component, my opinion would > > likely be > > >>> very different. > > >>> > > >>> I'd send this to the Flink list but I'm not subscribed. It'd be great > > if > > >>> you shared it with the people over there if you think they'd find it > > useful. > > >>> > > >>> > > >>> > > >>> On Tue, Jun 21, 2022 at 12:31 AM Martijn Visser < > > martijnvis...@apache.org> > > >>> wrote: > > >>> > > >>>> Thanks Julian and Austin! > > >>>> > > >>>> Any reply to kick-off some sort of discussion is worthwhile :D > > >>>> I definitely know the feeling of having more PRs open then you would > > like, > > >>>> looking at https://github.com/apache/flink/pulls :) > > >>>> > > >>>> There have been discussions in the Flink community about forking > > Calcite > > >>>> [1]. My personal preference at the moment is to see if we can > create a > > >>>> better collaboration and community. I believe that we can find > people > > from > > >>>> the Flink community who can open / help reviewing Calcite PRs that > are > > >>>> interesting for the Flink community. The question is if that will > > also help > > >>>> short term since in the end it still requires a Calcite maintainer > to > > >>>> review/merge. > > >>>> > > >>>> Best regards, > > >>>> > > >>>> Martijn > > >>>> > > >>>> [1] > https://lists.apache.org/thread/1oqydpsm4mc55bkk440gx9lr9gf2rvf4 > > >>>> > > >>>> > > >>>> Op ma 20 jun. 2022 om 23:51 schreef Austin Bennett < > > >>>> whatwouldausti...@gmail.com>: > > >>>> > > >>>>> From the peanut gallery :-) --> > > >>>>> > > >>>>> Wow; yes, lots of open PRs. > https://github.com/apache/calcite/pulls > > >>>>> > > >>>>> How can individuals from the Flink [sub-]community, and/or more > > general > > >>>>> calcite community help lighten this load? Is there much weight > > given to > > >>>>> reviews from non-committers; how to increase the # of people > capable > > of > > >>>>> providing worthwhile reviews [ that are recognized as such ]? > > >>>>> > > >>>>> > > >>>>> > > >>>>> On Mon, Jun 20, 2022 at 11:47 AM Julian Hyde < > jhyde.apa...@gmail.com > > > > > >>>>> wrote: > > >>>>> > > >>>>>> Martijn, > > >>>>>> > > >>>>>> Since you requested a reply, I am replying. To answer your > > question, I > > >>>>>> don’t know of a way to move this topic forward. We have more PRs > > than > > >>>>>> people to review them. > > >>>>>> > > >>>>>> Julian > > >>>>>> > > >>>>>> > > >>>>>>> On Jun 19, 2022, at 11:58 PM, Martijn Visser < > > >>>> martijnvis...@apache.org > > >>>>>> > > >>>>>> wrote: > > >>>>>>> > > >>>>>>> Hi everyone, > > >>>>>>> > > >>>>>>> I just wanted to reach out to the Calcite community once more on > > this > > >>>>>> topic > > >>>>>>> since no reply was received. Would be great if someone could get > > back > > >>>>> to > > >>>>>> us. > > >>>>>>> > > >>>>>>> Best regards, > > >>>>>>> > > >>>>>>> Martijn > > >>>>>>> > > >>>>>>> Op wo 8 jun. 2022 om 11:24 schreef Martijn Visser < > > >>>>>> martijnvis...@apache.org > > >>>>>>>> : > > >>>>>>> > > >>>>>>>> Hi everyone, > > >>>>>>>> > > >>>>>>>> I would like to follow-up on this email that was sent by Jing. > So > > >>>> far, > > >>>>>> no > > >>>>>>>> progress has been made, despite reaching out to the mailing > list, > > >>>> the > > >>>>>>>> original Jira ticket and reaching out to people directly. Is > > there a > > >>>>> way > > >>>>>>>> that we can move this PR/topic forward? > > >>>>>>>> > > >>>>>>>> For context, in Apache Flink we're currently heavily using > > Calcite. > > >>>>>>>> However, we are now at the stage where Calcite is actually > holding > > >>>> us > > >>>>>> back. > > >>>>>>>> It would be great if we can find a way to strengthen our bond > and > > >>>> move > > >>>>>> both > > >>>>>>>> Calcite and Flink forward. > > >>>>>>>> > > >>>>>>>> Looking forward to your thoughts, > > >>>>>>>> > > >>>>>>>> Martijn > > >>>>>>>> > > >>>>>>>> On 2022/01/26 07:05:37 Jing Zhang wrote: > > >>>>>>>>> Hi community, > > >>>>>>>>> My apologies for interrupting. > > >>>>>>>>> Anyone could help to review the pr > > >>>>>>>>> https://github.com/apache/calcite/pull/2606? > > >>>>>>>>> Thanks a lot. > > >>>>>>>>> > > >>>>>>>>> CALCITE-4865 is the first sub-task of CALCITE-4864. This Jira > > aims > > >>>> to > > >>>>>>>>> extend existing Table function in order to support Polymorphic > > >>>> Table > > >>>>>>>>> Function which is introduced as the part of ANSI SQL 2016. > > >>>>>>>>> > > >>>>>>>>> The brief change logs of the PR are: > > >>>>>>>>> - Update `Parser.jj` to support partition by clause and order > by > > >>>>>> clause > > >>>>>>>>> for input table with set semantics of PTF > > >>>>>>>>> - Introduce `TableCharacteristics` which contains three > > >>>>>> characteristics > > >>>>>>>>> of input table of table function > > >>>>>>>>> - Update `SqlTableFunction` to add a method > > >>>> `tableCharacteristics`, > > >>>>>>>> the > > >>>>>>>>> method returns the table characteristics for the ordinal-th > > >>>> argument > > >>>>> to > > >>>>>>>>> this table function. Default return value is Optional.empty > which > > >>>>> means > > >>>>>>>> the > > >>>>>>>>> ordinal-th argument is not table. > > >>>>>>>>> - Introduce `SqlSetSemanticsTable` which represents input table > > >>>> with > > >>>>>>>> set > > >>>>>>>>> semantics of Table Function, its `SqlKind` is > > `SET_SEMANTICS_TABLE` > > >>>>>>>>> - Updates `SqlValidatorImpl` to validate only set semantic > table > > >>>> of > > >>>>>>>> Table > > >>>>>>>>> Function could have partition by and order by clause > > >>>>>>>>> - Update `SqlToRelConverter#substituteSubQuery` to parse > subQuery > > >>>>>> which > > >>>>>>>>> represents set semantics table. > > >>>>>>>>> > > >>>>>>>>> PR: https://github.com/apache/calcite/pull/2606 > > >>>>>>>>> JIRA: https://issues.apache.org/jira/browse/CALCITE-4865 > > >>>>>>>>> Parent JARA: > https://issues.apache.org/jira/browse/CALCITE-4864 > > >>>>>>>>> > > >>>>>>>>> Best, > > >>>>>>>>> Jing Zhang > > >>>>>>>>> > > >>>>>>>> > > >>>>>> > > >>>>>> > > >>>>> > > >>>> > > >> > > > > > > > >