Regarding CALCITE-4123, I can provide a bit more of context.

I have been working on some improvements in EnumerableMergeJoin (support
new join types). I found that the easiest way to do so was extending the
class as MyOwnEnumerableMergeJoin and override the "implement" method, so
that instead of calling EnumerableDefault's mergeJoin, it calls my own
mergeJoin method, where I can develop and test these evolutions.
Eventually, once this code is satisfactory, I will push the improvements
upstream, so that they will be part of the following release (if the
community agrees), and I can get rid of MyOwnEnumerableMergeJoin and use
the default one instead. But this is a process that takes some time, so for
a certain period I have the improvements in my project via
MyOwnEnumerableMergeJoin, which is fine. Today this is not possible,
because EnumerableMergeJoin is not extensible (due to the package-private
constructor) so I need to create MyOwnEnumerableMergeJoin "from scratch"
(extending Join) and copy-pasting most of the code from
EnumerableMergeJoin. So it is possible to achieve my goal, but this is
(IMHO) not optimal. Moreover, I find it hard to understand why one could
use this extension-approach on EnumerableHashJoin or
EnumerableNestedLoopJoin (with protected constructors), but not with
EnumerableMergeJoin (with package-private constructor). Why is MergeJoin so
special?

Recently I found a different example of extending enumerable operators. I
started to test the topDown mechanism in Volcano Planner (great feature
btw), and I found myself in a position where I had to extend EnumerableCalc
to exploit this feature as much as possible. Basically I had to create
MyOwnEnumerableCalc (which extends EnumerableCalc) to override the
passThroughTraits method. The reason is that in my project I can produce an
IndexScan with some "special" collations (sub-fields on a struct field),
which are not supported by Calcite's standard collation mechanism (and
therefore were not passed through by EnumerableCalc#passThroughTraits to a
potential underlying Scan). The solution for that was simple, create
MyOwnEnumerableCalc and override passThroughTraits to provide my own
implementation considering also the specificities of my special collations,
and everything works perfectly. This is a very particular scenario that
will not be pushed upstream into standard Calcite, so probably I will have
to live with MyOwnEnumerableCalc, which I think is totally fine. And in
this case I had no problem with extension, because EnumerableCalc provides
a public constructor.

Honesty, I never thought this change would cause such a big discussion, I
will be more careful in the future and provide arguments in modifications
like this one, before going on with the change.
I do not want to block 1.24 on such a minor thing, if the community does
not agree with my change, or we need more time for discussion, I can revert
the change and re-consider it for 1.25. Personally, with the reasons that I
have provided, I think we can keep the change and release it in 1.24.

Best,
Ruben



Le ven. 17 juil. 2020 à 08:24, Julian Hyde <[email protected]> a écrit :

> When I write an API and do not include the word 'protected' on a
> constructor I am saying 'this is not a public API'. So someone coming
> along and adding the word 'protected' is actually making a big change.
>
> There are other ways to extend code than sub-classing. I don't know
> what use case Ruben has for extending EnumerableMergeJoin. I never
> will, because I can't see his code. EnumerableMergeJoin, like all
> RelNodes, is extensible by new RexNode operators, new types, new kinds
> of collation, and by composing with other RelNodes. With any of those
> other extension mechanisms, Calcite's language would have been richer.
> There is a lot to be said for keeping the building blocks fixed, and
> creating a new extension point only when we're tried everything else.
>
> On Thu, Jul 16, 2020 at 3:49 PM Stamatis Zampetakis <[email protected]>
> wrote:
> >
> > +1 for removing the final modifier for all the reasons mentioned so far.
> >
> > Regarding CALCITE-4123, I find Ruben's justification convincing.
> > The majority of Enumerable operators (not only joins) are extensible
> (with
> > public/protected constructors) so it seems that EnumerableMergeJoin was
> the
> > only exception to this pattern.
> > Now if we want to keep these APIs internal it is another discussion and I
> > think it should consider all Enumerable operators so for the moment I am
> > fine with the current change.
> > The only question from my side is why it is necessary to extend
> > EnumerableMergeJoin in the first place but this is mostly from curiosity,
> > and to see if there are things that should be changed in the operator to
> > avoid such extensions.
> > Let's continue the discussion under the respective JIRA if necessary.
> >
> > Best,
> > Stamatis
> >
> >
> > On Fri, Jul 17, 2020 at 1:30 AM Rui Wang <[email protected]> wrote:
> >
> > > Since the concern on CALCLITE-4123 is brought up, can we make a quick
> > > decision about whether to make it in the 1.24.0 release or not?
> > >
> > > Having the public interface increased in 1.24.0 and then later there
> might
> > > be a decision to revert it doesn't sound right. If we are not sure for
> now,
> > > maybe have a quick revert to unblock 1.24.0 release first?
> > >
> > >
> > > -Rui
> > >
> > > On Thu, Jul 16, 2020 at 3:17 PM Ruben Q L <[email protected]> wrote:
> > >
> > > > Thanks for the feedback, I will create a Jira ticket to carry out the
> > > > change and remove the "final" from AbstractRelNode#getRelTypeName.
> > > >
> > > > @Julian, I am sorry I committed that change (
> > > > https://issues.apache.org/jira/browse/CALCITE-4123) without further
> > > > discussion. I honestly thought it was a minor detail. We can discuss
> it,
> > > > and if the community does not agree with the change, we can revert
> it.
> > > > Basically, the package-private constructor in EnumerableMergeJoin
> > > prevents
> > > > it from being extended in downstream projects. This is not consistent
> > > with
> > > > other Join operators, such as EnumerableHashJoin,
> > > EnumerableNestedLoopJoin
> > > > or EnumerbleBatchNestedLoopJoin, all of them providing a protected
> > > > constructor. So I do not see a reason why a downstream project can
> extend
> > > > e.g. EnumerableHashJoin, but is unable to extend
> EnumerableMergeJoin. I
> > > > think there is no issue in extending these clases (otherwise they
> would
> > > be
> > > > final). I believed changing the constructor from package-private to
> > > > protected would solve this issue, without causing any
> > > > problematic side-effect.
> > > >
> > > > Best,
> > > > Ruben
> > > >
> > > >
> > > > Le jeu. 16 juil. 2020 à 21:32, Julian Hyde <[email protected]> a
> écrit :
> > > >
> > > > > I don't have a problem with this change.
> > > > >
> > > > > I do have a problem with changes, such as the one that Ruben
> recently
> > > > > committed [1] to make the constructor of EnumerableMergeJoin, that
> > > > > increase the surface of our public API. The larger our public API,
> the
> > > > > more work it is to add features to Calcite while adhering to
> semantic
> > > > > versioning.
> > > > >
> > > > > 'final' is sometimes used intentionally to limit the API surface
> area.
> > > > >
> > > > > Julian
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> https://github.com/apache/calcite/commit/6bb7e2d0b65ec3c7a0d82c92ca0564f8caec4af5
> > > > >
> > > > > On Thu, Jul 16, 2020 at 10:33 AM Rui Wang <[email protected]>
> > > wrote:
> > > > > >
> > > > > > +1 to make getRelTypeName overridable.
> > > > > >
> > > > > > Not a java language expert, to me when there are cases that a
> public
> > > > API
> > > > > > with final cannot solve, it's a good sign to remove that final to
> > > allow
> > > > > > customization to solve those specific use cases.
> > > > > >
> > > > > >
> > > > > > -Rui
> > > > > >
> > > > > > On Thu, Jul 16, 2020 at 8:58 AM Haisheng Yuan <[email protected]>
> > > > wrote:
> > > > > >
> > > > > > > +1 to remove final.
> > > > > > >
> > > > > > > The method returns the logical/physical operator name. When we
> > > > explain
> > > > > the
> > > > > > > plan (and compute the digest in 1.23.0 and before),
> getRelTypeName
> > > > may
> > > > > be
> > > > > > > called multiple times for a single operator, every time it will
> > > start
> > > > > from
> > > > > > > scratch to figure out what is correct operator name. Making it
> > > > > overridable
> > > > > > > will not only solve the case Ruben mentioned below, but also
> remove
> > > > the
> > > > > > > duplicate computation by just returning the literal string for
> the
> > > > > operator
> > > > > > > name, even if the gain might be negligible.
> > > > > > >
> > > > > > > The downside is when downstream projects who override the
> method
> > > > > refactor
> > > > > > > the operator class name, they have to manually update the
> method to
> > > > > return
> > > > > > > correct names. We get a flexibility by losing another one.
> But I
> > > > > think it
> > > > > > > is OK to add a caveat that use at your own risk.
> > > > > > >
> > > > > > > On 2020/07/16 14:32:31, Ruben Q L <[email protected]> wrote:
> > > > > > > > Hello everyone,
> > > > > > > >
> > > > > > > > I have a small issue regarding RelNode#getRelTypeName [1].
> This
> > > > > method is
> > > > > > > > used when explaining a plan (e.g. via RelOptUtil#dumpPlan),
> and
> > > it
> > > > > > > "returns
> > > > > > > > the name of this relational expression's class, sans package
> > > name,
> > > > > for
> > > > > > > use
> > > > > > > > in explain".
> > > > > > > > This method has a *final* implementation in AbstractRelNode
> [2],
> > > > > which
> > > > > > > > returns a String based on getClass().getName(). Normally,
> this
> > > > > > > > implementation should work fine for all RelNode
> implementations,
> > > > but
> > > > > > > > currently I am working on a project that uses obfuscation, so
> > > this
> > > > > > > > implementation returns a non-meaningful name on our RelNode
> > > > > > > > implementations (e.g. our TableScan operators), which will
> lead
> > > to
> > > > > > > > incomprehensible query plans in our logs. I would like to
> > > override
> > > > > the
> > > > > > > > method on our RelNode  implementations, so that they can
> return a
> > > > > "clear"
> > > > > > > > relTypeName, instead of the default one based on
> > > > > getClass().getName(),
> > > > > > > but
> > > > > > > > at the moment I cannot do that, since the implementation in
> > > > > > > AbstractRelNode
> > > > > > > > is final.
> > > > > > > >
> > > > > > > > What would you think about removing the "final" from
> > > > > > > > AbstractRelNode#getRelTypeName and allow downstream projects
> to
> > > > > provide
> > > > > > > > their own implementation? Of course, these projects shall be
> > > > > responsible
> > > > > > > > for providing a valid implementation at their own risk, we
> could
> > > > even
> > > > > > > add a
> > > > > > > > warning message in AbstractRelNode#getRelTypeName saying
> > > something
> > > > > like
> > > > > > > "do
> > > > > > > > not override this method unless you know what you are doing"
> or
> > > "it
> > > > > is
> > > > > > > > strongly discouraged to override this method".
> > > > > > > >
> > > > > > > > Of course, I know there are other alternatives for my
> problem,
> > > like
> > > > > > > > implementing my own dumpPlan and RelWriter mechanisms, but I
> > > would
> > > > > like
> > > > > > > to
> > > > > > > > explore the getRelTypeName possibility before doing that,
> because
> > > > it
> > > > > > > would
> > > > > > > > be a more straightforward solution, and it might be useful
> for
> > > > other
> > > > > > > people
> > > > > > > > as well.
> > > > > > > >
> > > > > > > > Thanks and best regards,
> > > > > > > > Ruben
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/calcite/blob/551e3f5a182a48e20586e40c378224bb63e4bfb3/core/src/main/java/org/apache/calcite/rel/RelNode.java#L367
> > > > > > > > [2]
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/calcite/blob/551e3f5a182a48e20586e40c378224bb63e4bfb3/core/src/main/java/org/apache/calcite/rel/AbstractRelNode.java#L182
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
>

Reply via email to