Several of the constructors of Enumerable RelNodes have the comment "Use {@link
#create} unless you know what you're doing”. I think it’s pretty clear - the
class and constructor had to be public for various reasons, but people should
not treat it as a blank check.
If your main motivation is consistency, you might have mentioned that comment.
But I suspect many people are motivated by maximizing freedom for downstream
projects. That’s fine, but that freedom is a burden on Calcite, and it is not
clear-cut that more freedom for downstream projects is a net benefit for
Calcite. As committers we should be thinking solely of the last consideration.
Julian
> On Jul 17, 2020, at 8:37 AM, Haisheng Yuan <[email protected]> wrote:
>
> Agree with Ruben.
>
> I don't see any convincing reasons to make EnumerableMergeJoin special.
>
> On 2020/07/17 07:46:35, Ruben Q L <[email protected]> wrote:
>> Regarding CALCITE-4123, I can provide a bit more of context.
>>
>> I have been working on some improvements in EnumerableMergeJoin (support
>> new join types). I found that the easiest way to do so was extending the
>> class as MyOwnEnumerableMergeJoin and override the "implement" method, so
>> that instead of calling EnumerableDefault's mergeJoin, it calls my own
>> mergeJoin method, where I can develop and test these evolutions.
>> Eventually, once this code is satisfactory, I will push the improvements
>> upstream, so that they will be part of the following release (if the
>> community agrees), and I can get rid of MyOwnEnumerableMergeJoin and use
>> the default one instead. But this is a process that takes some time, so for
>> a certain period I have the improvements in my project via
>> MyOwnEnumerableMergeJoin, which is fine. Today this is not possible,
>> because EnumerableMergeJoin is not extensible (due to the package-private
>> constructor) so I need to create MyOwnEnumerableMergeJoin "from scratch"
>> (extending Join) and copy-pasting most of the code from
>> EnumerableMergeJoin. So it is possible to achieve my goal, but this is
>> (IMHO) not optimal. Moreover, I find it hard to understand why one could
>> use this extension-approach on EnumerableHashJoin or
>> EnumerableNestedLoopJoin (with protected constructors), but not with
>> EnumerableMergeJoin (with package-private constructor). Why is MergeJoin so
>> special?
>>
>> Recently I found a different example of extending enumerable operators. I
>> started to test the topDown mechanism in Volcano Planner (great feature
>> btw), and I found myself in a position where I had to extend EnumerableCalc
>> to exploit this feature as much as possible. Basically I had to create
>> MyOwnEnumerableCalc (which extends EnumerableCalc) to override the
>> passThroughTraits method. The reason is that in my project I can produce an
>> IndexScan with some "special" collations (sub-fields on a struct field),
>> which are not supported by Calcite's standard collation mechanism (and
>> therefore were not passed through by EnumerableCalc#passThroughTraits to a
>> potential underlying Scan). The solution for that was simple, create
>> MyOwnEnumerableCalc and override passThroughTraits to provide my own
>> implementation considering also the specificities of my special collations,
>> and everything works perfectly. This is a very particular scenario that
>> will not be pushed upstream into standard Calcite, so probably I will have
>> to live with MyOwnEnumerableCalc, which I think is totally fine. And in
>> this case I had no problem with extension, because EnumerableCalc provides
>> a public constructor.
>>
>> Honesty, I never thought this change would cause such a big discussion, I
>> will be more careful in the future and provide arguments in modifications
>> like this one, before going on with the change.
>> I do not want to block 1.24 on such a minor thing, if the community does
>> not agree with my change, or we need more time for discussion, I can revert
>> the change and re-consider it for 1.25. Personally, with the reasons that I
>> have provided, I think we can keep the change and release it in 1.24.
>>
>> Best,
>> Ruben
>>
>>
>>
>> Le ven. 17 juil. 2020 à 08:24, Julian Hyde <[email protected]> a écrit :
>>
>>> When I write an API and do not include the word 'protected' on a
>>> constructor I am saying 'this is not a public API'. So someone coming
>>> along and adding the word 'protected' is actually making a big change.
>>>
>>> There are other ways to extend code than sub-classing. I don't know
>>> what use case Ruben has for extending EnumerableMergeJoin. I never
>>> will, because I can't see his code. EnumerableMergeJoin, like all
>>> RelNodes, is extensible by new RexNode operators, new types, new kinds
>>> of collation, and by composing with other RelNodes. With any of those
>>> other extension mechanisms, Calcite's language would have been richer.
>>> There is a lot to be said for keeping the building blocks fixed, and
>>> creating a new extension point only when we're tried everything else.
>>>
>>> On Thu, Jul 16, 2020 at 3:49 PM Stamatis Zampetakis <[email protected]>
>>> wrote:
>>>>
>>>> +1 for removing the final modifier for all the reasons mentioned so far.
>>>>
>>>> Regarding CALCITE-4123, I find Ruben's justification convincing.
>>>> The majority of Enumerable operators (not only joins) are extensible
>>> (with
>>>> public/protected constructors) so it seems that EnumerableMergeJoin was
>>> the
>>>> only exception to this pattern.
>>>> Now if we want to keep these APIs internal it is another discussion and I
>>>> think it should consider all Enumerable operators so for the moment I am
>>>> fine with the current change.
>>>> The only question from my side is why it is necessary to extend
>>>> EnumerableMergeJoin in the first place but this is mostly from curiosity,
>>>> and to see if there are things that should be changed in the operator to
>>>> avoid such extensions.
>>>> Let's continue the discussion under the respective JIRA if necessary.
>>>>
>>>> Best,
>>>> Stamatis
>>>>
>>>>
>>>> On Fri, Jul 17, 2020 at 1:30 AM Rui Wang <[email protected]> wrote:
>>>>
>>>>> Since the concern on CALCLITE-4123 is brought up, can we make a quick
>>>>> decision about whether to make it in the 1.24.0 release or not?
>>>>>
>>>>> Having the public interface increased in 1.24.0 and then later there
>>> might
>>>>> be a decision to revert it doesn't sound right. If we are not sure for
>>> now,
>>>>> maybe have a quick revert to unblock 1.24.0 release first?
>>>>>
>>>>>
>>>>> -Rui
>>>>>
>>>>> On Thu, Jul 16, 2020 at 3:17 PM Ruben Q L <[email protected]> wrote:
>>>>>
>>>>>> Thanks for the feedback, I will create a Jira ticket to carry out the
>>>>>> change and remove the "final" from AbstractRelNode#getRelTypeName.
>>>>>>
>>>>>> @Julian, I am sorry I committed that change (
>>>>>> https://issues.apache.org/jira/browse/CALCITE-4123) without further
>>>>>> discussion. I honestly thought it was a minor detail. We can discuss
>>> it,
>>>>>> and if the community does not agree with the change, we can revert
>>> it.
>>>>>> Basically, the package-private constructor in EnumerableMergeJoin
>>>>> prevents
>>>>>> it from being extended in downstream projects. This is not consistent
>>>>> with
>>>>>> other Join operators, such as EnumerableHashJoin,
>>>>> EnumerableNestedLoopJoin
>>>>>> or EnumerbleBatchNestedLoopJoin, all of them providing a protected
>>>>>> constructor. So I do not see a reason why a downstream project can
>>> extend
>>>>>> e.g. EnumerableHashJoin, but is unable to extend
>>> EnumerableMergeJoin. I
>>>>>> think there is no issue in extending these clases (otherwise they
>>> would
>>>>> be
>>>>>> final). I believed changing the constructor from package-private to
>>>>>> protected would solve this issue, without causing any
>>>>>> problematic side-effect.
>>>>>>
>>>>>> Best,
>>>>>> Ruben
>>>>>>
>>>>>>
>>>>>> Le jeu. 16 juil. 2020 à 21:32, Julian Hyde <[email protected]> a
>>> écrit :
>>>>>>
>>>>>>> I don't have a problem with this change.
>>>>>>>
>>>>>>> I do have a problem with changes, such as the one that Ruben
>>> recently
>>>>>>> committed [1] to make the constructor of EnumerableMergeJoin, that
>>>>>>> increase the surface of our public API. The larger our public API,
>>> the
>>>>>>> more work it is to add features to Calcite while adhering to
>>> semantic
>>>>>>> versioning.
>>>>>>>
>>>>>>> 'final' is sometimes used intentionally to limit the API surface
>>> area.
>>>>>>>
>>>>>>> Julian
>>>>>>>
>>>>>>> [1]
>>>>>>>
>>>>>>
>>>>>
>>> https://github.com/apache/calcite/commit/6bb7e2d0b65ec3c7a0d82c92ca0564f8caec4af5
>>>>>>>
>>>>>>> On Thu, Jul 16, 2020 at 10:33 AM Rui Wang <[email protected]>
>>>>> wrote:
>>>>>>>>
>>>>>>>> +1 to make getRelTypeName overridable.
>>>>>>>>
>>>>>>>> Not a java language expert, to me when there are cases that a
>>> public
>>>>>> API
>>>>>>>> with final cannot solve, it's a good sign to remove that final to
>>>>> allow
>>>>>>>> customization to solve those specific use cases.
>>>>>>>>
>>>>>>>>
>>>>>>>> -Rui
>>>>>>>>
>>>>>>>> On Thu, Jul 16, 2020 at 8:58 AM Haisheng Yuan <[email protected]>
>>>>>> wrote:
>>>>>>>>
>>>>>>>>> +1 to remove final.
>>>>>>>>>
>>>>>>>>> The method returns the logical/physical operator name. When we
>>>>>> explain
>>>>>>> the
>>>>>>>>> plan (and compute the digest in 1.23.0 and before),
>>> getRelTypeName
>>>>>> may
>>>>>>> be
>>>>>>>>> called multiple times for a single operator, every time it will
>>>>> start
>>>>>>> from
>>>>>>>>> scratch to figure out what is correct operator name. Making it
>>>>>>> overridable
>>>>>>>>> will not only solve the case Ruben mentioned below, but also
>>> remove
>>>>>> the
>>>>>>>>> duplicate computation by just returning the literal string for
>>> the
>>>>>>> operator
>>>>>>>>> name, even if the gain might be negligible.
>>>>>>>>>
>>>>>>>>> The downside is when downstream projects who override the
>>> method
>>>>>>> refactor
>>>>>>>>> the operator class name, they have to manually update the
>>> method to
>>>>>>> return
>>>>>>>>> correct names. We get a flexibility by losing another one.
>>> But I
>>>>>>> think it
>>>>>>>>> is OK to add a caveat that use at your own risk.
>>>>>>>>>
>>>>>>>>> On 2020/07/16 14:32:31, Ruben Q L <[email protected]> wrote:
>>>>>>>>>> Hello everyone,
>>>>>>>>>>
>>>>>>>>>> I have a small issue regarding RelNode#getRelTypeName [1].
>>> This
>>>>>>> method is
>>>>>>>>>> used when explaining a plan (e.g. via RelOptUtil#dumpPlan),
>>> and
>>>>> it
>>>>>>>>> "returns
>>>>>>>>>> the name of this relational expression's class, sans package
>>>>> name,
>>>>>>> for
>>>>>>>>> use
>>>>>>>>>> in explain".
>>>>>>>>>> This method has a *final* implementation in AbstractRelNode
>>> [2],
>>>>>>> which
>>>>>>>>>> returns a String based on getClass().getName(). Normally,
>>> this
>>>>>>>>>> implementation should work fine for all RelNode
>>> implementations,
>>>>>> but
>>>>>>>>>> currently I am working on a project that uses obfuscation, so
>>>>> this
>>>>>>>>>> implementation returns a non-meaningful name on our RelNode
>>>>>>>>>> implementations (e.g. our TableScan operators), which will
>>> lead
>>>>> to
>>>>>>>>>> incomprehensible query plans in our logs. I would like to
>>>>> override
>>>>>>> the
>>>>>>>>>> method on our RelNode implementations, so that they can
>>> return a
>>>>>>> "clear"
>>>>>>>>>> relTypeName, instead of the default one based on
>>>>>>> getClass().getName(),
>>>>>>>>> but
>>>>>>>>>> at the moment I cannot do that, since the implementation in
>>>>>>>>> AbstractRelNode
>>>>>>>>>> is final.
>>>>>>>>>>
>>>>>>>>>> What would you think about removing the "final" from
>>>>>>>>>> AbstractRelNode#getRelTypeName and allow downstream projects
>>> to
>>>>>>> provide
>>>>>>>>>> their own implementation? Of course, these projects shall be
>>>>>>> responsible
>>>>>>>>>> for providing a valid implementation at their own risk, we
>>> could
>>>>>> even
>>>>>>>>> add a
>>>>>>>>>> warning message in AbstractRelNode#getRelTypeName saying
>>>>> something
>>>>>>> like
>>>>>>>>> "do
>>>>>>>>>> not override this method unless you know what you are doing"
>>> or
>>>>> "it
>>>>>>> is
>>>>>>>>>> strongly discouraged to override this method".
>>>>>>>>>>
>>>>>>>>>> Of course, I know there are other alternatives for my
>>> problem,
>>>>> like
>>>>>>>>>> implementing my own dumpPlan and RelWriter mechanisms, but I
>>>>> would
>>>>>>> like
>>>>>>>>> to
>>>>>>>>>> explore the getRelTypeName possibility before doing that,
>>> because
>>>>>> it
>>>>>>>>> would
>>>>>>>>>> be a more straightforward solution, and it might be useful
>>> for
>>>>>> other
>>>>>>>>> people
>>>>>>>>>> as well.
>>>>>>>>>>
>>>>>>>>>> Thanks and best regards,
>>>>>>>>>> Ruben
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>> https://github.com/apache/calcite/blob/551e3f5a182a48e20586e40c378224bb63e4bfb3/core/src/main/java/org/apache/calcite/rel/RelNode.java#L367
>>>>>>>>>> [2]
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>> https://github.com/apache/calcite/blob/551e3f5a182a48e20586e40c378224bb63e4bfb3/core/src/main/java/org/apache/calcite/rel/AbstractRelNode.java#L182
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>