The RelMetadata system is designed for these kinds of annotations - if there is 
a “global hints cache” there’s no benefit to doing it outside the RelMetadata 
system.

That said, I don’t know (and I don’t think anyone knows) how we want hints to 
be propagated as we generate RelNodes from RelNodes. I think we should focus on 
really simple cases first (e.g. hints about the whole query, or about 
particular table scans), and not try to automatically propagate them.

We can make the hints propagation mechanism more sophisticated when we have an 
actual use case to drive us.

Julian
 

> On Apr 26, 2019, at 3:41 PM, Yuzhao Chen <[email protected]> wrote:
> 
> Thx, Julian
> 
> Let me repeat my thoughts about the details again, in order to implement the 
> hints, maybe these things are needed:
> 
> The main diff is that we will maintain a global hints cache
> 1. Supports hints grammar for parser.jj
> 2. During/after sql-to-rel, we may pass a hints cache to the 
> SqlToRelConverter, there is a visitor to setup/init the RelNodes’hints to the 
> cache once at a time, this cache scope is global and would be active the 
> whole query planning time. The cache only keep hints for few top nodes that 
> really needs
> 3. In the Planner, add set/get hints cache method, so that in the planning 
> rules, we can see the hints cache,
> And we can also ban some rule matching in the planner
> 4. Hook the RelOptCall#transformTo method to handle logic of hints 
> propagating(invoke the hints logic again same as sql-to-rel phrase), this 
> will also update the global hints cache
> It seems that given the global hints cache, we do not need the 
> MetaDataHandler any more, this is the thing I most want to make sure.
> Hope for your suggestions.
> 
> Best,
> Danny Chan
> 在 2019年4月25日 +0800 AM3:07,Julian Hyde <[email protected]>,写道:
>> I think it’s OK to attach hints to the (few) RelNodes that come out of the 
>> SqlToRelConverter.
>> 
>> But it would be a mistake to try to propagate those hints to all of the 
>> RelNodes that are created during query planning. Even if we changed all of 
>> the copy methods (a huge task) there are many other ways that RelNodes get 
>> created. We would end up with a RelNode graph with lots of hints, and most 
>> of those hints would be inaccurate or not applicable.
>> 
>> For a particular hint, say "/*+ nohashjoin */“, some piece of code would 
>> need to look at the initial RelNode tree and take its own action: say, build 
>> a data structure to be used by planner rules, or enable or disable planner 
>> rules.
>> 
>> 
>>> On Apr 23, 2019, at 9:31 PM, Chunwei Lei <[email protected]> wrote:
>>> 
>>> Thanks Danny.
>>> 
>>> Those are good points. I think it depends on what we consider hint as.
>>> IMHO, if we consider hint as a kind of metadata,
>>> it is not a good idea to store the hints in the RelNode instance.
>>> 
>>> 
>>> 
>>> Best,
>>> Chunwei
>>> 
>>> On Wed, Apr 24, 2019 at 11:09 AM Yuzhao Chen <[email protected]> wrote:
>>>> 
>>>> Thx, Julian
>>>> 
>>>> I think the hint path is a good way for searching RelNode’s parents, 
>>>> broadly, there may be these modules/things need to be modified:
>>>> 
>>>> 1. Supports hints grammar for parser.jj
>>>> 2. Cache the hints in the RelNode instance, and add method like 
>>>> RelNode#getHints() to fetch all the hints inherited for this node.
>>>> 3. Modify #copy method for every kind of RelNode so that the hints can be 
>>>> copied when creating new equivalent nodes.
>>>> 4. Add a visitor in after sql-to-rel phrase, to set up full hints list for 
>>>> every children RelNode if there exists any.
>>>> 5. Add hints metadata handler and handles the hints fetching and 
>>>> overriding for specific kind of RelNode
>>>> 
>>>> The 2 and 3 are the modifications that i really want to confirm, that is, 
>>>> shall we store the hints in the RelNode instance ?
>>>> 
>>>> These are initial thoughts and if we make agreement, I would output a 
>>>> detail design doc which contains:
>>>> 
>>>> 1. The hints grammar supported for the major sql engines
>>>> 2. The hints grammar supported for Apache Calcite
>>>> 3. The interface and design ideas of the proposed modifications
>>>> 
>>>> 
>>>> Best,
>>>> Danny Chan
>>>> 在 2019年4月24日 +0800 AM3:04,Julian Hyde <[email protected]>,写道:
>>>>> I see that if you have a hint on, say, the root node then it would be 
>>>>> nice for its child or grand-child to be able to see that hint.
>>>>> 
>>>>> How about giving each hint an inherit path? Thus given
>>>>> 
>>>>> Filter Hint1
>>>>> +- Join
>>>>> +- Scan
>>>>> +- Project Hint2
>>>>> +- Scan
>>>>> 
>>>>> 
>>>>> Filter would have hints {Hint1[]}
>>>>> Join would have hints {Hint1[0]}
>>>>> Scan would have hints {Hint1[0, 0]}
>>>>> Project would have hints {Hint1[0,1], Hint2}
>>>>> Scan2 would have hints {[Hint1[0, 0, 1, 0], Hint2[0]}
>>>>> 
>>>>> You could populate the hints and inherit paths with a single visitor pass 
>>>>> after sql-to-rel conversion.
>>>>> 
>>>>> By the way, I still like the idea of having kinds as a kind of 
>>>>> RelMetadata, but I realize that a given RelNode might have more than one 
>>>>> hint. So I think that the getHints(RelNode) method would return a 
>>>>> List<Hint>, with Hint as follows:
>>>>> 
>>>>> class Hint {
>>>>> public final List<Integer> inheritPath; // immutable, not null
>>>>> public final String type; // not null
>>>>> public final Object operand; // immutable, may be null, must be JSON data
>>>>> }
>>>>> 
>>>>> operand must be JSON-style data (null, boolean, number, String, immutable 
>>>>> List of JSON data, or immutable order-preserving Map from String to JSON 
>>>>> data).
>>>>> 
>>>>>> On Apr 23, 2019, at 1:25 AM, Yuzhao Chen <[email protected]> wrote:
>>>>>> 
>>>>>> Thx, Andrew
>>>>>> 
>>>>>> I don’t want to have a custom RelNode class, I hope all the work about 
>>>>>> hints would be contributed to the community. I want to find an 
>>>>>> acceptable way to keep and propagate the hints if we use the 
>>>>>> MetadataHandler to cache and query the hints.
>>>>>> 
>>>>>> I don’t think the hints should be mixed into the cost model, that would 
>>>>>> make the cost computation very complex and hard to maintain, we only 
>>>>>> need the hints in our planning phrase to give suggestions, hints is more 
>>>>>> like another guideline for me and transparent to the planner.
>>>>>> 
>>>>>> Best,
>>>>>> Danny Chan
>>>>>> 在 2019年4月23日 +0800 PM2:24,Андрей Цвелодуб <[email protected]>,写道:
>>>>>>> Hi Danny,
>>>>>>> 
>>>>>>> I would also agree with Julian on his position. I've tried to get around
>>>>>>> this limitation in several different ways, but none of it ended well :)
>>>>>>> 
>>>>>>> For your idea with hints, if you have custom RelNode classes, you can 
>>>>>>> add
>>>>>>> hint as an additional field of the class and you can write a simple rule
>>>>>>> that propagates the hint downwards, step by step. And also include the 
>>>>>>> hint
>>>>>>> in your cost estimation, so that nodes with hints would be more 
>>>>>>> attractive
>>>>>>> to the planner. I'm not sure this would be the most correct way to use 
>>>>>>> the
>>>>>>> cost mechanism, but at least it is straightforward and it works.
>>>>>>> 
>>>>>>> Best Regards,
>>>>>>> Andrew Tsvelodub
>>>>>>> 
>>>>>>> On Tue, 23 Apr 2019 at 08:44, Yuzhao Chen <[email protected]> wrote:
>>>>>>> 
>>>>>>>> Julian,
>>>>>>>> 
>>>>>>>> I want to add hint support for Calcite, the initial idea was to tag a
>>>>>>>> RelNode(transformed from a SqlNode with hint) with a hit attribute(or
>>>>>>>> trait), then I hope that the children (inputs) of it can see this 
>>>>>>>> hint, so
>>>>>>>> to make some decisions if it should consume or propagate the hint.
>>>>>>>> 
>>>>>>>> The problem I got here is the trait propagate from inputs from, which 
>>>>>>>> is
>>>>>>>> the opposite as what I need, can you give some suggestions ? If I use
>>>>>>>> MetadataHandler to cache and propagate the hints, how to propagate from
>>>>>>>> parents to children ?
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Danny Chan
>>>>>>>> 在 2019年4月23日 +0800 AM3:14,Julian Hyde <[email protected]>,写道:
>>>>>>>>> TL;DR: RelNodes don’t really have parents. Be careful if you are 
>>>>>>>>> relying
>>>>>>>> on the parent concept too much. Rely on rules instead.
>>>>>>>>> 
>>>>>>>>> In the Volcano model, a RelNode doesn’t really have a parent. It might
>>>>>>>> be used in several places. (RelSet has a field ‘List<RelNode> parents’ 
>>>>>>>> that
>>>>>>>> is kept up to date as planing progresses. But it’s really for Volcano’s
>>>>>>>> internal use.)
>>>>>>>>> 
>>>>>>>>> Even if you are not using Volcano, there are reasons to want the 
>>>>>>>>> RelNode
>>>>>>>> graph to be a dag, so again, a RelNode doesn’t have a unique parent.
>>>>>>>>> 
>>>>>>>>> RelShuttleImpl has a stack. You can use that to find the parent. But 
>>>>>>>>> the
>>>>>>>> “parent” is just “where we came from as we traversed the RelNode 
>>>>>>>> graph”.
>>>>>>>> There may be other “parents” that you do not know about.
>>>>>>>>> 
>>>>>>>>> If you have a Project and want to find all parents that are Filters,
>>>>>>>> don’t even think about “iterating over the parents” of the Project. 
>>>>>>>> Just
>>>>>>>> write a rule that matches a Filter on a Project, and trust Volcano to 
>>>>>>>> do
>>>>>>>> its job.
>>>>>>>>> 
>>>>>>>>> Julian
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Apr 22, 2019, at 6:15 AM, Yuzhao Chen <[email protected]> 
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Thx, Stamatis, that somehow make sense, if i pass around the parent
>>>>>>>> node every time I visit a RelNode and keep the parents in the cache, 
>>>>>>>> but it
>>>>>>>> is still not that intuitive. Actually I what a to add a new RelTrait 
>>>>>>>> which
>>>>>>>> bind to a specific scope, for example:
>>>>>>>>>> 
>>>>>>>>>> join-rel(trait1)
>>>>>>>>>> / \
>>>>>>>>>> join2 join3
>>>>>>>>>> 
>>>>>>>>>> Join-rel has a trait trait1, and I want all the children of join-rel
>>>>>>>> can see this trait, with Calcite’s default metadata handler, I can 
>>>>>>>> only see
>>>>>>>> the trait from children nodes(traits propagate from the inputs), and I 
>>>>>>>> have
>>>>>>>> no idea how to propagate a trait reversely?
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> Danny Chan
>>>>>>>>>> 在 2019年4月22日 +0800 PM8:44,Stamatis Zampetakis <[email protected]>,写道:
>>>>>>>>>>> Hi Danny,
>>>>>>>>>>> 
>>>>>>>>>>> Apart from RelShuttle there is also RelVisitor which has a visit
>>>>>>>> method
>>>>>>>>>>> that provides the parent [1]. Not sure, if it suits your needs.
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Stamatis
>>>>>>>>>>> 
>>>>>>>>>>> [1]
>>>>>>>>>>> 
>>>>>>>> https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/RelVisitor.java#L43
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, Apr 22, 2019 at 2:14 PM Yuzhao Chen <[email protected]>
>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Now for RelNode, we have method getInput()[1] to fetch the input
>>>>>>>>>>>> RelNodes, but how we fetch the parent ?
>>>>>>>>>>>> 
>>>>>>>>>>>> For example, we have plan:
>>>>>>>>>>>> 
>>>>>>>>>>>> join-rel
>>>>>>>>>>>> / \
>>>>>>>>>>>> scan1 scan2
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> We can get scan1 and scan2 in join-rel directly with method
>>>>>>>> getInput, but
>>>>>>>>>>>> how can we get the join rel in scan1 and scan 2 ?
>>>>>>>>>>>> 
>>>>>>>>>>>> I know that there is a RelShuttle that can visit every RelNode and
>>>>>>>> if I
>>>>>>>>>>>> make a cache for the inputs mapping, finally I can get the
>>>>>>>> ‘parents’ from
>>>>>>>>>>>> the cache, but this is boring code and not that intuitive.
>>>>>>>>>>>> 
>>>>>>>>>>>> Do you guys have any good ideas ?
>>>>>>>>>>>> 
>>>>>>>>>>>> [1]
>>>>>>>>>>>> 
>>>>>>>> https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/RelNode.java#L132
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Danny Chan
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>> 
>> 

Reply via email to