This is a very interesting topic, thanks for starting this discussion Danny!

Metadata seems a very reasonable way to handle hints and I guess the
proposal by Julian is the way to go.

For the sake of discussion (brainstorming), I was thinking that rules from
hints do not differ too much.
The final goal is to allow end-users to affect the planning process for a
given query towards a particular plan. I guess this could be achieved by
registering custom rules that are taken into account by the planner (and
can be possibly ignored).

I admit that passing hints alongside the SQL query is convenient for
end-users so most likely we cannot avoid changing the grammar. Following
the rules idea, I was wondering if instead of attaching hints to the
SqlNode/RelNode(s) as metadata or so we could rather
enable/disable/generate rules for the planner.

A small example below:

/* +EnumerableHashJoinRule(e1,d1) */
SELECT e1.name
FROM emp e1
INNER JOIN dept d1 ON e1.did=d1.id
WHERE d1.name = 'HR'

The rule matches when the operator is a Join.class and left child is
aliased to e1 and right child is aliased to d1; it transforms the
LogicalJoin to an EnumerableHashJoin.

I haven't thought a lot on this but maybe this hint/rule based approach
gives some inspiration to somebody in this list.

Best,
Stamatis

On Sat, Apr 27, 2019 at 1:40 AM Yuzhao Chen <[email protected]> wrote:

> Thx Julian
>
> Mostly got your idea, but one thing needs to confirm:
>
> Now the MetadataHandler is kind of query lazy the cache is code-gen ed in
> the handler class, the metadata also propagate from inputs, when I got an
> RelNode’s hint, how can I cache it in the metadata handler for querying ?
>
> Best,
> Danny Chan
> 在 2019年4月27日 +0800 AM6:57,Julian Hyde <[email protected]>,写道:
> > The RelMetadata system is designed for these kinds of annotations - if
> there is a “global hints cache” there’s no benefit to doing it outside the
> RelMetadata system.
> >
> > That said, I don’t know (and I don’t think anyone knows) how we want
> hints to be propagated as we generate RelNodes from RelNodes. I think we
> should focus on really simple cases first (e.g. hints about the whole
> query, or about particular table scans), and not try to automatically
> propagate them.
> >
> > We can make the hints propagation mechanism more sophisticated when we
> have an actual use case to drive us.
> >
> > Julian
> >
> >
> > > On Apr 26, 2019, at 3:41 PM, Yuzhao Chen <[email protected]> wrote:
> > >
> > > Thx, Julian
> > >
> > > Let me repeat my thoughts about the details again, in order to
> implement the hints, maybe these things are needed:
> > >
> > > The main diff is that we will maintain a global hints cache
> > > 1. Supports hints grammar for parser.jj
> > > 2. During/after sql-to-rel, we may pass a hints cache to the
> SqlToRelConverter, there is a visitor to setup/init the RelNodes’hints to
> the cache once at a time, this cache scope is global and would be active
> the whole query planning time. The cache only keep hints for few top nodes
> that really needs
> > > 3. In the Planner, add set/get hints cache method, so that in the
> planning rules, we can see the hints cache,
> > > And we can also ban some rule matching in the planner
> > > 4. Hook the RelOptCall#transformTo method to handle logic of hints
> propagating(invoke the hints logic again same as sql-to-rel phrase), this
> will also update the global hints cache
> > > It seems that given the global hints cache, we do not need the
> MetaDataHandler any more, this is the thing I most want to make sure.
> > > Hope for your suggestions.
> > >
> > > Best,
> > > Danny Chan
> > > 在 2019年4月25日 +0800 AM3:07,Julian Hyde <[email protected]>,写道:
> > > > I think it’s OK to attach hints to the (few) RelNodes that come out
> of the SqlToRelConverter.
> > > >
> > > > But it would be a mistake to try to propagate those hints to all of
> the RelNodes that are created during query planning. Even if we changed all
> of the copy methods (a huge task) there are many other ways that RelNodes
> get created. We would end up with a RelNode graph with lots of hints, and
> most of those hints would be inaccurate or not applicable.
> > > >
> > > > For a particular hint, say "/*+ nohashjoin */“, some piece of code
> would need to look at the initial RelNode tree and take its own action:
> say, build a data structure to be used by planner rules, or enable or
> disable planner rules.
> > > >
> > > >
> > > > > On Apr 23, 2019, at 9:31 PM, Chunwei Lei <[email protected]>
> wrote:
> > > > >
> > > > > Thanks Danny.
> > > > >
> > > > > Those are good points. I think it depends on what we consider hint
> as.
> > > > > IMHO, if we consider hint as a kind of metadata,
> > > > > it is not a good idea to store the hints in the RelNode instance.
> > > > >
> > > > >
> > > > >
> > > > > Best,
> > > > > Chunwei
> > > > >
> > > > > On Wed, Apr 24, 2019 at 11:09 AM Yuzhao Chen <[email protected]>
> wrote:
> > > > > >
> > > > > > Thx, Julian
> > > > > >
> > > > > > I think the hint path is a good way for searching RelNode’s
> parents, broadly, there may be these modules/things need to be modified:
> > > > > >
> > > > > > 1. Supports hints grammar for parser.jj
> > > > > > 2. Cache the hints in the RelNode instance, and add method like
> RelNode#getHints() to fetch all the hints inherited for this node.
> > > > > > 3. Modify #copy method for every kind of RelNode so that the
> hints can be copied when creating new equivalent nodes.
> > > > > > 4. Add a visitor in after sql-to-rel phrase, to set up full
> hints list for every children RelNode if there exists any.
> > > > > > 5. Add hints metadata handler and handles the hints fetching and
> overriding for specific kind of RelNode
> > > > > >
> > > > > > The 2 and 3 are the modifications that i really want to confirm,
> that is, shall we store the hints in the RelNode instance ?
> > > > > >
> > > > > > These are initial thoughts and if we make agreement, I would
> output a detail design doc which contains:
> > > > > >
> > > > > > 1. The hints grammar supported for the major sql engines
> > > > > > 2. The hints grammar supported for Apache Calcite
> > > > > > 3. The interface and design ideas of the proposed modifications
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > > Danny Chan
> > > > > > 在 2019年4月24日 +0800 AM3:04,Julian Hyde <[email protected]>,写道:
> > > > > > > I see that if you have a hint on, say, the root node then it
> would be nice for its child or grand-child to be able to see that hint.
> > > > > > >
> > > > > > > How about giving each hint an inherit path? Thus given
> > > > > > >
> > > > > > > Filter Hint1
> > > > > > > +- Join
> > > > > > > +- Scan
> > > > > > > +- Project Hint2
> > > > > > > +- Scan
> > > > > > >
> > > > > > >
> > > > > > > Filter would have hints {Hint1[]}
> > > > > > > Join would have hints {Hint1[0]}
> > > > > > > Scan would have hints {Hint1[0, 0]}
> > > > > > > Project would have hints {Hint1[0,1], Hint2}
> > > > > > > Scan2 would have hints {[Hint1[0, 0, 1, 0], Hint2[0]}
> > > > > > >
> > > > > > > You could populate the hints and inherit paths with a single
> visitor pass after sql-to-rel conversion.
> > > > > > >
> > > > > > > By the way, I still like the idea of having kinds as a kind of
> RelMetadata, but I realize that a given RelNode might have more than one
> hint. So I think that the getHints(RelNode) method would return a
> List<Hint>, with Hint as follows:
> > > > > > >
> > > > > > > class Hint {
> > > > > > > public final List<Integer> inheritPath; // immutable, not null
> > > > > > > public final String type; // not null
> > > > > > > public final Object operand; // immutable, may be null, must
> be JSON data
> > > > > > > }
> > > > > > >
> > > > > > > operand must be JSON-style data (null, boolean, number,
> String, immutable List of JSON data, or immutable order-preserving Map from
> String to JSON data).
> > > > > > >
> > > > > > > > On Apr 23, 2019, at 1:25 AM, Yuzhao Chen <
> [email protected]> wrote:
> > > > > > > >
> > > > > > > > Thx, Andrew
> > > > > > > >
> > > > > > > > I don’t want to have a custom RelNode class, I hope all the
> work about hints would be contributed to the community. I want to find an
> acceptable way to keep and propagate the hints if we use the
> MetadataHandler to cache and query the hints.
> > > > > > > >
> > > > > > > > I don’t think the hints should be mixed into the cost model,
> that would make the cost computation very complex and hard to maintain, we
> only need the hints in our planning phrase to give suggestions, hints is
> more like another guideline for me and transparent to the planner.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Danny Chan
> > > > > > > > 在 2019年4月23日 +0800 PM2:24,Андрей Цвелодуб <
> [email protected]>,写道:
> > > > > > > > > Hi Danny,
> > > > > > > > >
> > > > > > > > > I would also agree with Julian on his position. I've tried
> to get around
> > > > > > > > > this limitation in several different ways, but none of it
> ended well :)
> > > > > > > > >
> > > > > > > > > For your idea with hints, if you have custom RelNode
> classes, you can add
> > > > > > > > > hint as an additional field of the class and you can write
> a simple rule
> > > > > > > > > that propagates the hint downwards, step by step. And also
> include the hint
> > > > > > > > > in your cost estimation, so that nodes with hints would be
> more attractive
> > > > > > > > > to the planner. I'm not sure this would be the most
> correct way to use the
> > > > > > > > > cost mechanism, but at least it is straightforward and it
> works.
> > > > > > > > >
> > > > > > > > > Best Regards,
> > > > > > > > > Andrew Tsvelodub
> > > > > > > > >
> > > > > > > > > On Tue, 23 Apr 2019 at 08:44, Yuzhao Chen <
> [email protected]> wrote:
> > > > > > > > >
> > > > > > > > > > Julian,
> > > > > > > > > >
> > > > > > > > > > I want to add hint support for Calcite, the initial idea
> was to tag a
> > > > > > > > > > RelNode(transformed from a SqlNode with hint) with a hit
> attribute(or
> > > > > > > > > > trait), then I hope that the children (inputs) of it can
> see this hint, so
> > > > > > > > > > to make some decisions if it should consume or propagate
> the hint.
> > > > > > > > > >
> > > > > > > > > > The problem I got here is the trait propagate from
> inputs from, which is
> > > > > > > > > > the opposite as what I need, can you give some
> suggestions ? If I use
> > > > > > > > > > MetadataHandler to cache and propagate the hints, how to
> propagate from
> > > > > > > > > > parents to children ?
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Danny Chan
> > > > > > > > > > 在 2019年4月23日 +0800 AM3:14,Julian Hyde <[email protected]
> >,写道:
> > > > > > > > > > > TL;DR: RelNodes don’t really have parents. Be careful
> if you are relying
> > > > > > > > > > on the parent concept too much. Rely on rules instead.
> > > > > > > > > > >
> > > > > > > > > > > In the Volcano model, a RelNode doesn’t really have a
> parent. It might
> > > > > > > > > > be used in several places. (RelSet has a field
> ‘List<RelNode> parents’ that
> > > > > > > > > > is kept up to date as planing progresses. But it’s
> really for Volcano’s
> > > > > > > > > > internal use.)
> > > > > > > > > > >
> > > > > > > > > > > Even if you are not using Volcano, there are reasons
> to want the RelNode
> > > > > > > > > > graph to be a dag, so again, a RelNode doesn’t have a
> unique parent.
> > > > > > > > > > >
> > > > > > > > > > > RelShuttleImpl has a stack. You can use that to find
> the parent. But the
> > > > > > > > > > “parent” is just “where we came from as we traversed the
> RelNode graph”.
> > > > > > > > > > There may be other “parents” that you do not know about.
> > > > > > > > > > >
> > > > > > > > > > > If you have a Project and want to find all parents
> that are Filters,
> > > > > > > > > > don’t even think about “iterating over the parents” of
> the Project. Just
> > > > > > > > > > write a rule that matches a Filter on a Project, and
> trust Volcano to do
> > > > > > > > > > its job.
> > > > > > > > > > >
> > > > > > > > > > > Julian
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > On Apr 22, 2019, at 6:15 AM, Yuzhao Chen <
> [email protected]> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Thx, Stamatis, that somehow make sense, if i pass
> around the parent
> > > > > > > > > > node every time I visit a RelNode and keep the parents
> in the cache, but it
> > > > > > > > > > is still not that intuitive. Actually I what a to add a
> new RelTrait which
> > > > > > > > > > bind to a specific scope, for example:
> > > > > > > > > > > >
> > > > > > > > > > > > join-rel(trait1)
> > > > > > > > > > > > / \
> > > > > > > > > > > > join2 join3
> > > > > > > > > > > >
> > > > > > > > > > > > Join-rel has a trait trait1, and I want all the
> children of join-rel
> > > > > > > > > > can see this trait, with Calcite’s default metadata
> handler, I can only see
> > > > > > > > > > the trait from children nodes(traits propagate from the
> inputs), and I have
> > > > > > > > > > no idea how to propagate a trait reversely?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Best,
> > > > > > > > > > > > Danny Chan
> > > > > > > > > > > > 在 2019年4月22日 +0800 PM8:44,Stamatis Zampetakis <
> [email protected]>,写道:
> > > > > > > > > > > > > Hi Danny,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Apart from RelShuttle there is also RelVisitor
> which has a visit
> > > > > > > > > > method
> > > > > > > > > > > > > that provides the parent [1]. Not sure, if it
> suits your needs.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Best,
> > > > > > > > > > > > > Stamatis
> > > > > > > > > > > > >
> > > > > > > > > > > > > [1]
> > > > > > > > > > > > >
> > > > > > > > > >
> https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/RelVisitor.java#L43
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Apr 22, 2019 at 2:14 PM Yuzhao Chen <
> [email protected]>
> > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Now for RelNode, we have method getInput()[1] to
> fetch the input
> > > > > > > > > > > > > > RelNodes, but how we fetch the parent ?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > For example, we have plan:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > join-rel
> > > > > > > > > > > > > > / \
> > > > > > > > > > > > > > scan1 scan2
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > We can get scan1 and scan2 in join-rel directly
> with method
> > > > > > > > > > getInput, but
> > > > > > > > > > > > > > how can we get the join rel in scan1 and scan 2 ?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I know that there is a RelShuttle that can visit
> every RelNode and
> > > > > > > > > > if I
> > > > > > > > > > > > > > make a cache for the inputs mapping, finally I
> can get the
> > > > > > > > > > ‘parents’ from
> > > > > > > > > > > > > > the cache, but this is boring code and not that
> intuitive.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Do you guys have any good ideas ?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > >
> > > > > > > > > >
> https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/RelNode.java#L132
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > Danny Chan
> > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > >
> > > >
> >
>

Reply via email to