Re: [DISCUSS] Remove "final" from AbstractRelNode#getRelTypeName

Haisheng Yuan Fri, 17 Jul 2020 08:38:15 -0700

Agree with Ruben.

I don't see any convincing reasons to make EnumerableMergeJoin special.


On 2020/07/17 07:46:35, Ruben Q L <[email protected]> wrote: 
> Regarding CALCITE-4123, I can provide a bit more of context.
> 
> I have been working on some improvements in EnumerableMergeJoin (support
> new join types). I found that the easiest way to do so was extending the
> class as MyOwnEnumerableMergeJoin and override the "implement" method, so
> that instead of calling EnumerableDefault's mergeJoin, it calls my own
> mergeJoin method, where I can develop and test these evolutions.
> Eventually, once this code is satisfactory, I will push the improvements
> upstream, so that they will be part of the following release (if the
> community agrees), and I can get rid of MyOwnEnumerableMergeJoin and use
> the default one instead. But this is a process that takes some time, so for
> a certain period I have the improvements in my project via
> MyOwnEnumerableMergeJoin, which is fine. Today this is not possible,
> because EnumerableMergeJoin is not extensible (due to the package-private
> constructor) so I need to create MyOwnEnumerableMergeJoin "from scratch"
> (extending Join) and copy-pasting most of the code from
> EnumerableMergeJoin. So it is possible to achieve my goal, but this is
> (IMHO) not optimal. Moreover, I find it hard to understand why one could
> use this extension-approach on EnumerableHashJoin or
> EnumerableNestedLoopJoin (with protected constructors), but not with
> EnumerableMergeJoin (with package-private constructor). Why is MergeJoin so
> special?
> 
> Recently I found a different example of extending enumerable operators. I
> started to test the topDown mechanism in Volcano Planner (great feature
> btw), and I found myself in a position where I had to extend EnumerableCalc
> to exploit this feature as much as possible. Basically I had to create
> MyOwnEnumerableCalc (which extends EnumerableCalc) to override the
> passThroughTraits method. The reason is that in my project I can produce an
> IndexScan with some "special" collations (sub-fields on a struct field),
> which are not supported by Calcite's standard collation mechanism (and
> therefore were not passed through by EnumerableCalc#passThroughTraits to a
> potential underlying Scan). The solution for that was simple, create
> MyOwnEnumerableCalc and override passThroughTraits to provide my own
> implementation considering also the specificities of my special collations,
> and everything works perfectly. This is a very particular scenario that
> will not be pushed upstream into standard Calcite, so probably I will have
> to live with MyOwnEnumerableCalc, which I think is totally fine. And in
> this case I had no problem with extension, because EnumerableCalc provides
> a public constructor.
> 
> Honesty, I never thought this change would cause such a big discussion, I
> will be more careful in the future and provide arguments in modifications
> like this one, before going on with the change.
> I do not want to block 1.24 on such a minor thing, if the community does
> not agree with my change, or we need more time for discussion, I can revert
> the change and re-consider it for 1.25. Personally, with the reasons that I
> have provided, I think we can keep the change and release it in 1.24.
> 
> Best,
> Ruben
> 
> 
> 
> Le ven. 17 juil. 2020 à 08:24, Julian Hyde <[email protected]> a écrit :
> 
> > When I write an API and do not include the word 'protected' on a
> > constructor I am saying 'this is not a public API'. So someone coming
> > along and adding the word 'protected' is actually making a big change.
> >
> > There are other ways to extend code than sub-classing. I don't know
> > what use case Ruben has for extending EnumerableMergeJoin. I never
> > will, because I can't see his code. EnumerableMergeJoin, like all
> > RelNodes, is extensible by new RexNode operators, new types, new kinds
> > of collation, and by composing with other RelNodes. With any of those
> > other extension mechanisms, Calcite's language would have been richer.
> > There is a lot to be said for keeping the building blocks fixed, and
> > creating a new extension point only when we're tried everything else.
> >
> > On Thu, Jul 16, 2020 at 3:49 PM Stamatis Zampetakis <[email protected]>
> > wrote:
> > >
> > > +1 for removing the final modifier for all the reasons mentioned so far.
> > >
> > > Regarding CALCITE-4123, I find Ruben's justification convincing.
> > > The majority of Enumerable operators (not only joins) are extensible
> > (with
> > > public/protected constructors) so it seems that EnumerableMergeJoin was
> > the
> > > only exception to this pattern.
> > > Now if we want to keep these APIs internal it is another discussion and I
> > > think it should consider all Enumerable operators so for the moment I am
> > > fine with the current change.
> > > The only question from my side is why it is necessary to extend
> > > EnumerableMergeJoin in the first place but this is mostly from curiosity,
> > > and to see if there are things that should be changed in the operator to
> > > avoid such extensions.
> > > Let's continue the discussion under the respective JIRA if necessary.
> > >
> > > Best,
> > > Stamatis
> > >
> > >
> > > On Fri, Jul 17, 2020 at 1:30 AM Rui Wang <[email protected]> wrote:
> > >
> > > > Since the concern on CALCLITE-4123 is brought up, can we make a quick
> > > > decision about whether to make it in the 1.24.0 release or not?
> > > >
> > > > Having the public interface increased in 1.24.0 and then later there
> > might
> > > > be a decision to revert it doesn't sound right. If we are not sure for
> > now,
> > > > maybe have a quick revert to unblock 1.24.0 release first?
> > > >
> > > >
> > > > -Rui
> > > >
> > > > On Thu, Jul 16, 2020 at 3:17 PM Ruben Q L <[email protected]> wrote:
> > > >
> > > > > Thanks for the feedback, I will create a Jira ticket to carry out the
> > > > > change and remove the "final" from AbstractRelNode#getRelTypeName.
> > > > >
> > > > > @Julian, I am sorry I committed that change (
> > > > > https://issues.apache.org/jira/browse/CALCITE-4123) without further
> > > > > discussion. I honestly thought it was a minor detail. We can discuss
> > it,
> > > > > and if the community does not agree with the change, we can revert
> > it.
> > > > > Basically, the package-private constructor in EnumerableMergeJoin
> > > > prevents
> > > > > it from being extended in downstream projects. This is not consistent
> > > > with
> > > > > other Join operators, such as EnumerableHashJoin,
> > > > EnumerableNestedLoopJoin
> > > > > or EnumerbleBatchNestedLoopJoin, all of them providing a protected
> > > > > constructor. So I do not see a reason why a downstream project can
> > extend
> > > > > e.g. EnumerableHashJoin, but is unable to extend
> > EnumerableMergeJoin. I
> > > > > think there is no issue in extending these clases (otherwise they
> > would
> > > > be
> > > > > final). I believed changing the constructor from package-private to
> > > > > protected would solve this issue, without causing any
> > > > > problematic side-effect.
> > > > >
> > > > > Best,
> > > > > Ruben
> > > > >
> > > > >
> > > > > Le jeu. 16 juil. 2020 à 21:32, Julian Hyde <[email protected]> a
> > écrit :
> > > > >
> > > > > > I don't have a problem with this change.
> > > > > >
> > > > > > I do have a problem with changes, such as the one that Ruben
> > recently
> > > > > > committed [1] to make the constructor of EnumerableMergeJoin, that
> > > > > > increase the surface of our public API. The larger our public API,
> > the
> > > > > > more work it is to add features to Calcite while adhering to
> > semantic
> > > > > > versioning.
> > > > > >
> > > > > > 'final' is sometimes used intentionally to limit the API surface
> > area.
> > > > > >
> > > > > > Julian
> > > > > >
> > > > > > [1]
> > > > > >
> > > > >
> > > >
> > https://github.com/apache/calcite/commit/6bb7e2d0b65ec3c7a0d82c92ca0564f8caec4af5
> > > > > >
> > > > > > On Thu, Jul 16, 2020 at 10:33 AM Rui Wang <[email protected]>
> > > > wrote:
> > > > > > >
> > > > > > > +1 to make getRelTypeName overridable.
> > > > > > >
> > > > > > > Not a java language expert, to me when there are cases that a
> > public
> > > > > API
> > > > > > > with final cannot solve, it's a good sign to remove that final to
> > > > allow
> > > > > > > customization to solve those specific use cases.
> > > > > > >
> > > > > > >
> > > > > > > -Rui
> > > > > > >
> > > > > > > On Thu, Jul 16, 2020 at 8:58 AM Haisheng Yuan <[email protected]>
> > > > > wrote:
> > > > > > >
> > > > > > > > +1 to remove final.
> > > > > > > >
> > > > > > > > The method returns the logical/physical operator name. When we
> > > > > explain
> > > > > > the
> > > > > > > > plan (and compute the digest in 1.23.0 and before),
> > getRelTypeName
> > > > > may
> > > > > > be
> > > > > > > > called multiple times for a single operator, every time it will
> > > > start
> > > > > > from
> > > > > > > > scratch to figure out what is correct operator name. Making it
> > > > > > overridable
> > > > > > > > will not only solve the case Ruben mentioned below, but also
> > remove
> > > > > the
> > > > > > > > duplicate computation by just returning the literal string for
> > the
> > > > > > operator
> > > > > > > > name, even if the gain might be negligible.
> > > > > > > >
> > > > > > > > The downside is when downstream projects who override the
> > method
> > > > > > refactor
> > > > > > > > the operator class name, they have to manually update the
> > method to
> > > > > > return
> > > > > > > > correct names. We get a flexibility by losing another one.
> > But I
> > > > > > think it
> > > > > > > > is OK to add a caveat that use at your own risk.
> > > > > > > >
> > > > > > > > On 2020/07/16 14:32:31, Ruben Q L <[email protected]> wrote:
> > > > > > > > > Hello everyone,
> > > > > > > > >
> > > > > > > > > I have a small issue regarding RelNode#getRelTypeName [1].
> > This
> > > > > > method is
> > > > > > > > > used when explaining a plan (e.g. via RelOptUtil#dumpPlan),
> > and
> > > > it
> > > > > > > > "returns
> > > > > > > > > the name of this relational expression's class, sans package
> > > > name,
> > > > > > for
> > > > > > > > use
> > > > > > > > > in explain".
> > > > > > > > > This method has a *final* implementation in AbstractRelNode
> > [2],
> > > > > > which
> > > > > > > > > returns a String based on getClass().getName(). Normally,
> > this
> > > > > > > > > implementation should work fine for all RelNode
> > implementations,
> > > > > but
> > > > > > > > > currently I am working on a project that uses obfuscation, so
> > > > this
> > > > > > > > > implementation returns a non-meaningful name on our RelNode
> > > > > > > > > implementations (e.g. our TableScan operators), which will
> > lead
> > > > to
> > > > > > > > > incomprehensible query plans in our logs. I would like to
> > > > override
> > > > > > the
> > > > > > > > > method on our RelNode  implementations, so that they can
> > return a
> > > > > > "clear"
> > > > > > > > > relTypeName, instead of the default one based on
> > > > > > getClass().getName(),
> > > > > > > > but
> > > > > > > > > at the moment I cannot do that, since the implementation in
> > > > > > > > AbstractRelNode
> > > > > > > > > is final.
> > > > > > > > >
> > > > > > > > > What would you think about removing the "final" from
> > > > > > > > > AbstractRelNode#getRelTypeName and allow downstream projects
> > to
> > > > > > provide
> > > > > > > > > their own implementation? Of course, these projects shall be
> > > > > > responsible
> > > > > > > > > for providing a valid implementation at their own risk, we
> > could
> > > > > even
> > > > > > > > add a
> > > > > > > > > warning message in AbstractRelNode#getRelTypeName saying
> > > > something
> > > > > > like
> > > > > > > > "do
> > > > > > > > > not override this method unless you know what you are doing"
> > or
> > > > "it
> > > > > > is
> > > > > > > > > strongly discouraged to override this method".
> > > > > > > > >
> > > > > > > > > Of course, I know there are other alternatives for my
> > problem,
> > > > like
> > > > > > > > > implementing my own dumpPlan and RelWriter mechanisms, but I
> > > > would
> > > > > > like
> > > > > > > > to
> > > > > > > > > explore the getRelTypeName possibility before doing that,
> > because
> > > > > it
> > > > > > > > would
> > > > > > > > > be a more straightforward solution, and it might be useful
> > for
> > > > > other
> > > > > > > > people
> > > > > > > > > as well.
> > > > > > > > >
> > > > > > > > > Thanks and best regards,
> > > > > > > > > Ruben
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > https://github.com/apache/calcite/blob/551e3f5a182a48e20586e40c378224bb63e4bfb3/core/src/main/java/org/apache/calcite/rel/RelNode.java#L367
> > > > > > > > > [2]
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > https://github.com/apache/calcite/blob/551e3f5a182a48e20586e40c378224bb63e4bfb3/core/src/main/java/org/apache/calcite/rel/AbstractRelNode.java#L182
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> >
>

Re: [DISCUSS] Remove "final" from AbstractRelNode#getRelTypeName

Reply via email to