Agree with Ruben. I don't see any convincing reasons to make EnumerableMergeJoin special.
On 2020/07/17 07:46:35, Ruben Q L <[email protected]> wrote: > Regarding CALCITE-4123, I can provide a bit more of context. > > I have been working on some improvements in EnumerableMergeJoin (support > new join types). I found that the easiest way to do so was extending the > class as MyOwnEnumerableMergeJoin and override the "implement" method, so > that instead of calling EnumerableDefault's mergeJoin, it calls my own > mergeJoin method, where I can develop and test these evolutions. > Eventually, once this code is satisfactory, I will push the improvements > upstream, so that they will be part of the following release (if the > community agrees), and I can get rid of MyOwnEnumerableMergeJoin and use > the default one instead. But this is a process that takes some time, so for > a certain period I have the improvements in my project via > MyOwnEnumerableMergeJoin, which is fine. Today this is not possible, > because EnumerableMergeJoin is not extensible (due to the package-private > constructor) so I need to create MyOwnEnumerableMergeJoin "from scratch" > (extending Join) and copy-pasting most of the code from > EnumerableMergeJoin. So it is possible to achieve my goal, but this is > (IMHO) not optimal. Moreover, I find it hard to understand why one could > use this extension-approach on EnumerableHashJoin or > EnumerableNestedLoopJoin (with protected constructors), but not with > EnumerableMergeJoin (with package-private constructor). Why is MergeJoin so > special? > > Recently I found a different example of extending enumerable operators. I > started to test the topDown mechanism in Volcano Planner (great feature > btw), and I found myself in a position where I had to extend EnumerableCalc > to exploit this feature as much as possible. Basically I had to create > MyOwnEnumerableCalc (which extends EnumerableCalc) to override the > passThroughTraits method. The reason is that in my project I can produce an > IndexScan with some "special" collations (sub-fields on a struct field), > which are not supported by Calcite's standard collation mechanism (and > therefore were not passed through by EnumerableCalc#passThroughTraits to a > potential underlying Scan). The solution for that was simple, create > MyOwnEnumerableCalc and override passThroughTraits to provide my own > implementation considering also the specificities of my special collations, > and everything works perfectly. This is a very particular scenario that > will not be pushed upstream into standard Calcite, so probably I will have > to live with MyOwnEnumerableCalc, which I think is totally fine. And in > this case I had no problem with extension, because EnumerableCalc provides > a public constructor. > > Honesty, I never thought this change would cause such a big discussion, I > will be more careful in the future and provide arguments in modifications > like this one, before going on with the change. > I do not want to block 1.24 on such a minor thing, if the community does > not agree with my change, or we need more time for discussion, I can revert > the change and re-consider it for 1.25. Personally, with the reasons that I > have provided, I think we can keep the change and release it in 1.24. > > Best, > Ruben > > > > Le ven. 17 juil. 2020 à 08:24, Julian Hyde <[email protected]> a écrit : > > > When I write an API and do not include the word 'protected' on a > > constructor I am saying 'this is not a public API'. So someone coming > > along and adding the word 'protected' is actually making a big change. > > > > There are other ways to extend code than sub-classing. I don't know > > what use case Ruben has for extending EnumerableMergeJoin. I never > > will, because I can't see his code. EnumerableMergeJoin, like all > > RelNodes, is extensible by new RexNode operators, new types, new kinds > > of collation, and by composing with other RelNodes. With any of those > > other extension mechanisms, Calcite's language would have been richer. > > There is a lot to be said for keeping the building blocks fixed, and > > creating a new extension point only when we're tried everything else. > > > > On Thu, Jul 16, 2020 at 3:49 PM Stamatis Zampetakis <[email protected]> > > wrote: > > > > > > +1 for removing the final modifier for all the reasons mentioned so far. > > > > > > Regarding CALCITE-4123, I find Ruben's justification convincing. > > > The majority of Enumerable operators (not only joins) are extensible > > (with > > > public/protected constructors) so it seems that EnumerableMergeJoin was > > the > > > only exception to this pattern. > > > Now if we want to keep these APIs internal it is another discussion and I > > > think it should consider all Enumerable operators so for the moment I am > > > fine with the current change. > > > The only question from my side is why it is necessary to extend > > > EnumerableMergeJoin in the first place but this is mostly from curiosity, > > > and to see if there are things that should be changed in the operator to > > > avoid such extensions. > > > Let's continue the discussion under the respective JIRA if necessary. > > > > > > Best, > > > Stamatis > > > > > > > > > On Fri, Jul 17, 2020 at 1:30 AM Rui Wang <[email protected]> wrote: > > > > > > > Since the concern on CALCLITE-4123 is brought up, can we make a quick > > > > decision about whether to make it in the 1.24.0 release or not? > > > > > > > > Having the public interface increased in 1.24.0 and then later there > > might > > > > be a decision to revert it doesn't sound right. If we are not sure for > > now, > > > > maybe have a quick revert to unblock 1.24.0 release first? > > > > > > > > > > > > -Rui > > > > > > > > On Thu, Jul 16, 2020 at 3:17 PM Ruben Q L <[email protected]> wrote: > > > > > > > > > Thanks for the feedback, I will create a Jira ticket to carry out the > > > > > change and remove the "final" from AbstractRelNode#getRelTypeName. > > > > > > > > > > @Julian, I am sorry I committed that change ( > > > > > https://issues.apache.org/jira/browse/CALCITE-4123) without further > > > > > discussion. I honestly thought it was a minor detail. We can discuss > > it, > > > > > and if the community does not agree with the change, we can revert > > it. > > > > > Basically, the package-private constructor in EnumerableMergeJoin > > > > prevents > > > > > it from being extended in downstream projects. This is not consistent > > > > with > > > > > other Join operators, such as EnumerableHashJoin, > > > > EnumerableNestedLoopJoin > > > > > or EnumerbleBatchNestedLoopJoin, all of them providing a protected > > > > > constructor. So I do not see a reason why a downstream project can > > extend > > > > > e.g. EnumerableHashJoin, but is unable to extend > > EnumerableMergeJoin. I > > > > > think there is no issue in extending these clases (otherwise they > > would > > > > be > > > > > final). I believed changing the constructor from package-private to > > > > > protected would solve this issue, without causing any > > > > > problematic side-effect. > > > > > > > > > > Best, > > > > > Ruben > > > > > > > > > > > > > > > Le jeu. 16 juil. 2020 à 21:32, Julian Hyde <[email protected]> a > > écrit : > > > > > > > > > > > I don't have a problem with this change. > > > > > > > > > > > > I do have a problem with changes, such as the one that Ruben > > recently > > > > > > committed [1] to make the constructor of EnumerableMergeJoin, that > > > > > > increase the surface of our public API. The larger our public API, > > the > > > > > > more work it is to add features to Calcite while adhering to > > semantic > > > > > > versioning. > > > > > > > > > > > > 'final' is sometimes used intentionally to limit the API surface > > area. > > > > > > > > > > > > Julian > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > https://github.com/apache/calcite/commit/6bb7e2d0b65ec3c7a0d82c92ca0564f8caec4af5 > > > > > > > > > > > > On Thu, Jul 16, 2020 at 10:33 AM Rui Wang <[email protected]> > > > > wrote: > > > > > > > > > > > > > > +1 to make getRelTypeName overridable. > > > > > > > > > > > > > > Not a java language expert, to me when there are cases that a > > public > > > > > API > > > > > > > with final cannot solve, it's a good sign to remove that final to > > > > allow > > > > > > > customization to solve those specific use cases. > > > > > > > > > > > > > > > > > > > > > -Rui > > > > > > > > > > > > > > On Thu, Jul 16, 2020 at 8:58 AM Haisheng Yuan <[email protected]> > > > > > wrote: > > > > > > > > > > > > > > > +1 to remove final. > > > > > > > > > > > > > > > > The method returns the logical/physical operator name. When we > > > > > explain > > > > > > the > > > > > > > > plan (and compute the digest in 1.23.0 and before), > > getRelTypeName > > > > > may > > > > > > be > > > > > > > > called multiple times for a single operator, every time it will > > > > start > > > > > > from > > > > > > > > scratch to figure out what is correct operator name. Making it > > > > > > overridable > > > > > > > > will not only solve the case Ruben mentioned below, but also > > remove > > > > > the > > > > > > > > duplicate computation by just returning the literal string for > > the > > > > > > operator > > > > > > > > name, even if the gain might be negligible. > > > > > > > > > > > > > > > > The downside is when downstream projects who override the > > method > > > > > > refactor > > > > > > > > the operator class name, they have to manually update the > > method to > > > > > > return > > > > > > > > correct names. We get a flexibility by losing another one. > > But I > > > > > > think it > > > > > > > > is OK to add a caveat that use at your own risk. > > > > > > > > > > > > > > > > On 2020/07/16 14:32:31, Ruben Q L <[email protected]> wrote: > > > > > > > > > Hello everyone, > > > > > > > > > > > > > > > > > > I have a small issue regarding RelNode#getRelTypeName [1]. > > This > > > > > > method is > > > > > > > > > used when explaining a plan (e.g. via RelOptUtil#dumpPlan), > > and > > > > it > > > > > > > > "returns > > > > > > > > > the name of this relational expression's class, sans package > > > > name, > > > > > > for > > > > > > > > use > > > > > > > > > in explain". > > > > > > > > > This method has a *final* implementation in AbstractRelNode > > [2], > > > > > > which > > > > > > > > > returns a String based on getClass().getName(). Normally, > > this > > > > > > > > > implementation should work fine for all RelNode > > implementations, > > > > > but > > > > > > > > > currently I am working on a project that uses obfuscation, so > > > > this > > > > > > > > > implementation returns a non-meaningful name on our RelNode > > > > > > > > > implementations (e.g. our TableScan operators), which will > > lead > > > > to > > > > > > > > > incomprehensible query plans in our logs. I would like to > > > > override > > > > > > the > > > > > > > > > method on our RelNode implementations, so that they can > > return a > > > > > > "clear" > > > > > > > > > relTypeName, instead of the default one based on > > > > > > getClass().getName(), > > > > > > > > but > > > > > > > > > at the moment I cannot do that, since the implementation in > > > > > > > > AbstractRelNode > > > > > > > > > is final. > > > > > > > > > > > > > > > > > > What would you think about removing the "final" from > > > > > > > > > AbstractRelNode#getRelTypeName and allow downstream projects > > to > > > > > > provide > > > > > > > > > their own implementation? Of course, these projects shall be > > > > > > responsible > > > > > > > > > for providing a valid implementation at their own risk, we > > could > > > > > even > > > > > > > > add a > > > > > > > > > warning message in AbstractRelNode#getRelTypeName saying > > > > something > > > > > > like > > > > > > > > "do > > > > > > > > > not override this method unless you know what you are doing" > > or > > > > "it > > > > > > is > > > > > > > > > strongly discouraged to override this method". > > > > > > > > > > > > > > > > > > Of course, I know there are other alternatives for my > > problem, > > > > like > > > > > > > > > implementing my own dumpPlan and RelWriter mechanisms, but I > > > > would > > > > > > like > > > > > > > > to > > > > > > > > > explore the getRelTypeName possibility before doing that, > > because > > > > > it > > > > > > > > would > > > > > > > > > be a more straightforward solution, and it might be useful > > for > > > > > other > > > > > > > > people > > > > > > > > > as well. > > > > > > > > > > > > > > > > > > Thanks and best regards, > > > > > > > > > Ruben > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/calcite/blob/551e3f5a182a48e20586e40c378224bb63e4bfb3/core/src/main/java/org/apache/calcite/rel/RelNode.java#L367 > > > > > > > > > [2] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/calcite/blob/551e3f5a182a48e20586e40c378224bb63e4bfb3/core/src/main/java/org/apache/calcite/rel/AbstractRelNode.java#L182 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
