I can do a hackathon if it helps. You're approaching it as a "big bang" but we could approach it as a burn down, counting the number of commits different between drill's fork and master. There's a lot of incremental value each commit we remove.
Julian > On Nov 17, 2015, at 08:58, Jacques Nadeau <[email protected]> wrote: > > Hey Jinfeng, > > Thanks for pulling this together. It is definitely time to get off the > fork. I see a number of items with PCM. Should we do a hackathon to do all > the PCMs one afternoon? Maybe we could convince Julian to join so we could > make short progress of the issues. The original commits that have PCM were > done by the following developers: Mehant, Jinfeng, Jacques, Venki & Aman. > Even if we just get Aman, Jinfeng and myself, we could probably close out > most of the gaps. (Mehant only has one and it seems pretty small. Venki's > is larger but is only one as well.) > > It seems like an achievable goal that we could get could get on top of > 1.5.x master + star + 2-3 other patches for the 1.4 release. What are your > thoughts? > > > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > >> On Tue, Nov 10, 2015 at 2:55 PM, Jinfeng Ni <[email protected]> wrote: >> >> Julian, >> >> Thanks a lot for your detailed comments! >> >> I totally agree with you on "calcite first" policy. That's why on MapR >> side, I recommend people get changes in Calcite master first, then >> cherry-pick back. Several recent patches follow this patten, as we >> realized the pain to maintain this calcite fork. >> >> I also agree that we should get them in ASAP and it is a good idea to >> start with simple patch firsts. >> >> >> 1. Why do you need "Add new method to ViewExpander interface to allow >> passing SchemaRoot."? Most ViewExpander implementations have a root >> schema. Note that the return type of that method has changed from >> RelNode to RelRoot. >> >> [ >> Venki might have better idea. As far as I remember, the patch was >> merged to handle case where the view is created in a different schema >> from the source schema. Drill uses Frameworks, not the regular code >> path that Calcite master uses. Not sure if that's the cause of the >> problem on Drill side. >> ] >> >> 2. "Match Logical Rel in ReduceExpressionRule." should be easy now we've >> parameterized the rules, and maybe you can get rid of "Enable >> inheriting from previously singleton filter and calc constant >> reduction rules.". >> >> [ Agree.] >> >> 3. "Disable the nullability cast checking in Filter." is worth discussing >> on the Calcite list. >> >> [ Will do that once I revise the patch with testcase.] >> >> 4. In "Return validated row type and validated SqlNode when call >> Planner.validate()." you should return a Pair<SqlNode, RelDataType>. >> >> [ The patch essentially returns Pair<SqlNode, RelDataType>, since it >> put SqlNode and RelDataType in a wrapper. With Calcite master, seems >> RelRoot probably is a better choice; it includes everything that we >> want. >> ] >> >> >> 5. To do "Use case-insensitive comparison when creating output row type >> of a JoinRel." properly we require a new config parameter saying >> whether internal field names are case-sensitive. >> >> Since "Don't use 'SumEmptyIsZero' (SUM0) window aggregate until >> CALCITE-777 is fixed." requires a fix to CALCITE-777 please contribute >> a fix for CALCITE-777! :) >> >> [Aman will have better idea] >> >> 6. Why do you need "Make public the constructor of >> SqlSumEmptyIsZeroAggFunction,"? >> >> [ It's used in Dril's ReduceAggregateRule. We should revisit this rule >> to see whether it's still required to keep this rule in stead of using >> Calcite's ReduceAggregateRule directly. >> ] >> >> 7. You don't need "Make certain push project constructors protected such >> that derived classes can access them." now we've parameterized >> ProjectFilterTransposeRule. >> >> [Agree.] >> >> 8. "DRILL-1455: Add return type-inference strategy for arithmetic >> operators when one of the arguments is ANY type." needs some >> discussion. What if both are ANY? How are other implementations of the >> operators (e.g. Calcite, Phoenix) going to implement the operator now >> that Drill's lax validation policy has allowed it in? >> >> [ Will discuss in Calcite list once we have the PR ready for this patch. >> ] >> >> 9. I can't see yet how the [StarColumn] changes could make it in. Will >> require some discussion. Note that CALCITE-546 has broken some of your >> assumptions. >> >> [ I know it would be big headache to resolve the changes related to >> star column for schema-on-read table. I have to re-visit those >> changes, and probably start a discussion on Calcite list once I do a >> rebase and code cleanup >> ] >> >> Again, thanks for your feedback. I feel it's the right time for the >> Drill community to try to get off calcite fork! >> >> Regards, >> >> Jinfeng >> >> >>> On Tue, Nov 10, 2015 at 1:51 PM, Julian Hyde <[email protected]> wrote: >>> Going forward, I think you should start a "calcite first" policy -- >>> get changes accepted into Calcite, with tests of course, then >>> cherry-pick into Drill's branch. Recent changes like "Allow a MAP >>> literal type." should have done that. And especially changes where you >>> could just parameterize planner rule. In other words, stop digging the >>> hole deeper. >>> >>> I think you should identify and do the easy ones ASAP. >>> >>> Rather than creating a branch for each of these contributions, if you >>> prefer you could create a pull request of the whole branch I think >>> that individual JIRAs can just reference a commit id that a Calcite >>> committer can cherry-pick. >>> >>> Now some remarks on specific changes. >>> >>> Why do you need "Add new method to ViewExpander interface to allow >>> passing SchemaRoot."? Most ViewExpander implementations have a root >>> schema. Note that the return type of that method has changed from >>> RelNode to RelRoot. >>> >>> "Match Logical Rel in ReduceExpressionRule." should be easy now we've >>> parameterized the rules, and maybe you can get rid of "Enable >>> inheriting from previously singleton filter and calc constant >>> reduction rules.". >>> >>> "Disable the nullability cast checking in Filter." is worth discussing >>> on the Calcite list. >>> >>> In "Return validated row type and validated SqlNode when call >>> Planner.validate()." you should return a Pair<SqlNode, RelDataType>. >>> >>> To do "Use case-insensitive comparison when creating output row type >>> of a JoinRel." properly we require a new config parameter saying >>> whether internal field names are case-sensitive. >>> >>> Since "Don't use 'SumEmptyIsZero' (SUM0) window aggregate until >>> CALCITE-777 is fixed." requires a fix to CALCITE-777 please contribute >>> a fix for CALCITE-777! :) >>> >>> Why do you need "Make public the constructor of >> SqlSumEmptyIsZeroAggFunction,"? >>> >>> You don't need "Make certain push project constructors protected such >>> that derived classes can access them." now we've parameterized >>> ProjectFilterTransposeRule. >>> >>> "DRILL-1455: Add return type-inference strategy for arithmetic >>> operators when one of the arguments is ANY type." needs some >>> discussion. What if both are ANY? How are other implementations of the >>> operators (e.g. Calcite, Phoenix) going to implement the operator now >>> that Drill's lax validation policy has allowed it in? >>> >>> I can't see yet how the [StarColumn] changes could make it in. Will >>> require some discussion. Note that CALCITE-546 has broken some of your >>> assumptions. >>> >>> Julian >>> >>> >>> >>>> On Tue, Nov 10, 2015 at 12:27 PM, Jinfeng Ni <[email protected]> wrote: >>>> Currently, Drill maintains a forked version of Calcite. Because of >>>> this, it has been a big headache to rebase onto the latest Calcite >>>> master branch every time when Calcite makes a new release. >>>> >>>> During today's Drill hangout, we discuss ideas around how to get Drill >>>> off the forked Calcite, by either pushing back patches in the forked >>>> version to Calcite master branch, or trying to see if those patches >>>> are still required for Drill to work properly. >>>> >>>> There are 49 commits in forked Calcite whose base is Calcite 1.4.0 >>>> release [1]. I put a spreadsheet for the current status of those 49 >>>> commits, as far as I know [2]. Half of them are the patches that were >>>> cherry-picked from Calcite master branch, 9 are just for version >>>> change, and the rest requires either push back or re-work in Drill >>>> side. >>>> >>>> I think we probably have to spend considerable effort to get this done >>>> ASAP. Otherwise, it will continue to be an issue for Drill. >>>> >>>> There are two separate work to be done: >>>> 1. Continue to rebase forked version onto latest Calcite master >>>> (Currently it's 1.5.0) >>>> 2. Push the patches in fork to Calcite master. Once all patches are >>>> in, we could finally get rid of fork and rebasing efforts. >>>> >>>> The spreadsheet has a column for sign-up. >>>> >>>> Thoughts? >>>> >>>> (let me know if you have problem accessing the spreadsheet). >>>> >>>> >>>> [1] https://github.com/dremio/calcite/tree/DrillCalcite1.4.0 >>>> [2] >> https://docs.google.com/spreadsheets/d/1Z0J5aMCBfqq_9SEl-LXsi_B8Ahaosygqke4cH6pPwmo/edit#gid=0 >>
