Regarding CALCITE-4123, I can provide a bit more of context. I have been working on some improvements in EnumerableMergeJoin (support new join types). I found that the easiest way to do so was extending the class as MyOwnEnumerableMergeJoin and override the "implement" method, so that instead of calling EnumerableDefault's mergeJoin, it calls my own mergeJoin method, where I can develop and test these evolutions. Eventually, once this code is satisfactory, I will push the improvements upstream, so that they will be part of the following release (if the community agrees), and I can get rid of MyOwnEnumerableMergeJoin and use the default one instead. But this is a process that takes some time, so for a certain period I have the improvements in my project via MyOwnEnumerableMergeJoin, which is fine. Today this is not possible, because EnumerableMergeJoin is not extensible (due to the package-private constructor) so I need to create MyOwnEnumerableMergeJoin "from scratch" (extending Join) and copy-pasting most of the code from EnumerableMergeJoin. So it is possible to achieve my goal, but this is (IMHO) not optimal. Moreover, I find it hard to understand why one could use this extension-approach on EnumerableHashJoin or EnumerableNestedLoopJoin (with protected constructors), but not with EnumerableMergeJoin (with package-private constructor). Why is MergeJoin so special?
Recently I found a different example of extending enumerable operators. I started to test the topDown mechanism in Volcano Planner (great feature btw), and I found myself in a position where I had to extend EnumerableCalc to exploit this feature as much as possible. Basically I had to create MyOwnEnumerableCalc (which extends EnumerableCalc) to override the passThroughTraits method. The reason is that in my project I can produce an IndexScan with some "special" collations (sub-fields on a struct field), which are not supported by Calcite's standard collation mechanism (and therefore were not passed through by EnumerableCalc#passThroughTraits to a potential underlying Scan). The solution for that was simple, create MyOwnEnumerableCalc and override passThroughTraits to provide my own implementation considering also the specificities of my special collations, and everything works perfectly. This is a very particular scenario that will not be pushed upstream into standard Calcite, so probably I will have to live with MyOwnEnumerableCalc, which I think is totally fine. And in this case I had no problem with extension, because EnumerableCalc provides a public constructor. Honesty, I never thought this change would cause such a big discussion, I will be more careful in the future and provide arguments in modifications like this one, before going on with the change. I do not want to block 1.24 on such a minor thing, if the community does not agree with my change, or we need more time for discussion, I can revert the change and re-consider it for 1.25. Personally, with the reasons that I have provided, I think we can keep the change and release it in 1.24. Best, Ruben Le ven. 17 juil. 2020 à 08:24, Julian Hyde <[email protected]> a écrit : > When I write an API and do not include the word 'protected' on a > constructor I am saying 'this is not a public API'. So someone coming > along and adding the word 'protected' is actually making a big change. > > There are other ways to extend code than sub-classing. I don't know > what use case Ruben has for extending EnumerableMergeJoin. I never > will, because I can't see his code. EnumerableMergeJoin, like all > RelNodes, is extensible by new RexNode operators, new types, new kinds > of collation, and by composing with other RelNodes. With any of those > other extension mechanisms, Calcite's language would have been richer. > There is a lot to be said for keeping the building blocks fixed, and > creating a new extension point only when we're tried everything else. > > On Thu, Jul 16, 2020 at 3:49 PM Stamatis Zampetakis <[email protected]> > wrote: > > > > +1 for removing the final modifier for all the reasons mentioned so far. > > > > Regarding CALCITE-4123, I find Ruben's justification convincing. > > The majority of Enumerable operators (not only joins) are extensible > (with > > public/protected constructors) so it seems that EnumerableMergeJoin was > the > > only exception to this pattern. > > Now if we want to keep these APIs internal it is another discussion and I > > think it should consider all Enumerable operators so for the moment I am > > fine with the current change. > > The only question from my side is why it is necessary to extend > > EnumerableMergeJoin in the first place but this is mostly from curiosity, > > and to see if there are things that should be changed in the operator to > > avoid such extensions. > > Let's continue the discussion under the respective JIRA if necessary. > > > > Best, > > Stamatis > > > > > > On Fri, Jul 17, 2020 at 1:30 AM Rui Wang <[email protected]> wrote: > > > > > Since the concern on CALCLITE-4123 is brought up, can we make a quick > > > decision about whether to make it in the 1.24.0 release or not? > > > > > > Having the public interface increased in 1.24.0 and then later there > might > > > be a decision to revert it doesn't sound right. If we are not sure for > now, > > > maybe have a quick revert to unblock 1.24.0 release first? > > > > > > > > > -Rui > > > > > > On Thu, Jul 16, 2020 at 3:17 PM Ruben Q L <[email protected]> wrote: > > > > > > > Thanks for the feedback, I will create a Jira ticket to carry out the > > > > change and remove the "final" from AbstractRelNode#getRelTypeName. > > > > > > > > @Julian, I am sorry I committed that change ( > > > > https://issues.apache.org/jira/browse/CALCITE-4123) without further > > > > discussion. I honestly thought it was a minor detail. We can discuss > it, > > > > and if the community does not agree with the change, we can revert > it. > > > > Basically, the package-private constructor in EnumerableMergeJoin > > > prevents > > > > it from being extended in downstream projects. This is not consistent > > > with > > > > other Join operators, such as EnumerableHashJoin, > > > EnumerableNestedLoopJoin > > > > or EnumerbleBatchNestedLoopJoin, all of them providing a protected > > > > constructor. So I do not see a reason why a downstream project can > extend > > > > e.g. EnumerableHashJoin, but is unable to extend > EnumerableMergeJoin. I > > > > think there is no issue in extending these clases (otherwise they > would > > > be > > > > final). I believed changing the constructor from package-private to > > > > protected would solve this issue, without causing any > > > > problematic side-effect. > > > > > > > > Best, > > > > Ruben > > > > > > > > > > > > Le jeu. 16 juil. 2020 à 21:32, Julian Hyde <[email protected]> a > écrit : > > > > > > > > > I don't have a problem with this change. > > > > > > > > > > I do have a problem with changes, such as the one that Ruben > recently > > > > > committed [1] to make the constructor of EnumerableMergeJoin, that > > > > > increase the surface of our public API. The larger our public API, > the > > > > > more work it is to add features to Calcite while adhering to > semantic > > > > > versioning. > > > > > > > > > > 'final' is sometimes used intentionally to limit the API surface > area. > > > > > > > > > > Julian > > > > > > > > > > [1] > > > > > > > > > > > > > https://github.com/apache/calcite/commit/6bb7e2d0b65ec3c7a0d82c92ca0564f8caec4af5 > > > > > > > > > > On Thu, Jul 16, 2020 at 10:33 AM Rui Wang <[email protected]> > > > wrote: > > > > > > > > > > > > +1 to make getRelTypeName overridable. > > > > > > > > > > > > Not a java language expert, to me when there are cases that a > public > > > > API > > > > > > with final cannot solve, it's a good sign to remove that final to > > > allow > > > > > > customization to solve those specific use cases. > > > > > > > > > > > > > > > > > > -Rui > > > > > > > > > > > > On Thu, Jul 16, 2020 at 8:58 AM Haisheng Yuan <[email protected]> > > > > wrote: > > > > > > > > > > > > > +1 to remove final. > > > > > > > > > > > > > > The method returns the logical/physical operator name. When we > > > > explain > > > > > the > > > > > > > plan (and compute the digest in 1.23.0 and before), > getRelTypeName > > > > may > > > > > be > > > > > > > called multiple times for a single operator, every time it will > > > start > > > > > from > > > > > > > scratch to figure out what is correct operator name. Making it > > > > > overridable > > > > > > > will not only solve the case Ruben mentioned below, but also > remove > > > > the > > > > > > > duplicate computation by just returning the literal string for > the > > > > > operator > > > > > > > name, even if the gain might be negligible. > > > > > > > > > > > > > > The downside is when downstream projects who override the > method > > > > > refactor > > > > > > > the operator class name, they have to manually update the > method to > > > > > return > > > > > > > correct names. We get a flexibility by losing another one. > But I > > > > > think it > > > > > > > is OK to add a caveat that use at your own risk. > > > > > > > > > > > > > > On 2020/07/16 14:32:31, Ruben Q L <[email protected]> wrote: > > > > > > > > Hello everyone, > > > > > > > > > > > > > > > > I have a small issue regarding RelNode#getRelTypeName [1]. > This > > > > > method is > > > > > > > > used when explaining a plan (e.g. via RelOptUtil#dumpPlan), > and > > > it > > > > > > > "returns > > > > > > > > the name of this relational expression's class, sans package > > > name, > > > > > for > > > > > > > use > > > > > > > > in explain". > > > > > > > > This method has a *final* implementation in AbstractRelNode > [2], > > > > > which > > > > > > > > returns a String based on getClass().getName(). Normally, > this > > > > > > > > implementation should work fine for all RelNode > implementations, > > > > but > > > > > > > > currently I am working on a project that uses obfuscation, so > > > this > > > > > > > > implementation returns a non-meaningful name on our RelNode > > > > > > > > implementations (e.g. our TableScan operators), which will > lead > > > to > > > > > > > > incomprehensible query plans in our logs. I would like to > > > override > > > > > the > > > > > > > > method on our RelNode implementations, so that they can > return a > > > > > "clear" > > > > > > > > relTypeName, instead of the default one based on > > > > > getClass().getName(), > > > > > > > but > > > > > > > > at the moment I cannot do that, since the implementation in > > > > > > > AbstractRelNode > > > > > > > > is final. > > > > > > > > > > > > > > > > What would you think about removing the "final" from > > > > > > > > AbstractRelNode#getRelTypeName and allow downstream projects > to > > > > > provide > > > > > > > > their own implementation? Of course, these projects shall be > > > > > responsible > > > > > > > > for providing a valid implementation at their own risk, we > could > > > > even > > > > > > > add a > > > > > > > > warning message in AbstractRelNode#getRelTypeName saying > > > something > > > > > like > > > > > > > "do > > > > > > > > not override this method unless you know what you are doing" > or > > > "it > > > > > is > > > > > > > > strongly discouraged to override this method". > > > > > > > > > > > > > > > > Of course, I know there are other alternatives for my > problem, > > > like > > > > > > > > implementing my own dumpPlan and RelWriter mechanisms, but I > > > would > > > > > like > > > > > > > to > > > > > > > > explore the getRelTypeName possibility before doing that, > because > > > > it > > > > > > > would > > > > > > > > be a more straightforward solution, and it might be useful > for > > > > other > > > > > > > people > > > > > > > > as well. > > > > > > > > > > > > > > > > Thanks and best regards, > > > > > > > > Ruben > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/calcite/blob/551e3f5a182a48e20586e40c378224bb63e4bfb3/core/src/main/java/org/apache/calcite/rel/RelNode.java#L367 > > > > > > > > [2] > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/calcite/blob/551e3f5a182a48e20586e40c378224bb63e4bfb3/core/src/main/java/org/apache/calcite/rel/AbstractRelNode.java#L182 > > > > > > > > > > > > > > > > > > > > > > > > > > > >
