[ 
https://issues.apache.org/jira/browse/CALCITE-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16995369#comment-16995369
 ] 

Rui Wang edited comment on CALCITE-3594 at 12/14/19 4:38 AM:
-------------------------------------------------------------

[~danny0405] first of all, thanks for your detailed written [Hint 
doc|https://docs.google.com/document/d/1mykz-w2t1Yw7CH6NjUWpWqCAf_6YNKxSc59gXafrNCs/edit#].
 I learned a lot from it.

{quote}define a hint keyword (We may some reference with other engines to give 
it a readable name){quote}
I think you are saying if there is a better name or a common name used by other 
engines for "hot_key"?

{quote}define the hint options(either a simple identifier list or k-v list), 
need to make the options as much common to use{quote}
This is a tricky one. I see that there is a support for kv list where key is 
SqlIdentifier and value is just a string literal. In the hot key case, we will 
need a list of kv pair where key should be a literal. It is because groupby 
keys are columns that can have different types. So my current thought is adding 
support of list of kv pair: Literal:Literal, as a hint's body. What do you 
think?

{quote}define the hint strategy for the hint item (see if you need some special 
conditions for the relational expressions){quote}
Agreed. In 
[HintStrategies|https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/hint/HintStrategies.java#L22],
 there is no a strategy for aggregation, I probably should add one. Also 
Aggregrate might should implement Hintable interface as well.



Lastly, in parser I believe there is no support to parse a hint in group by. I 
will probably also need to add one.


was (Author: amaliujia):
[~danny0405] first of all, thanks for your detailed written [Hint 
doc|https://docs.google.com/document/d/1mykz-w2t1Yw7CH6NjUWpWqCAf_6YNKxSc59gXafrNCs/edit#].
 I learned a lot from it.

{quote}define a hint keyword (We may some reference with other engines to give 
it a readable name){quote}
I think you are saying if there is a better name or a common name used by other 
engines for "hot_key"?

{quote}define the hint options(either a simple identifier list or k-v list), 
need to make the options as much common to use{quote}
This is a tricky one. I see that there is a support for kv list where key is 
SqlIdentifier and value is just a string literal. In the hot key case, we will 
need a list of kv pair where key should be a literal. It is because groupby 
keys can have different types so the value should have different type. So my 
current thought is adding support of list of kv pair: Literal:Literal, as a 
hint's body. What do you think?

{quote}define the hint strategy for the hint item (see if you need some special 
conditions for the relational expressions){quote}
Agreed. In 
[HintStrategies|https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/hint/HintStrategies.java#L22],
 there is no a strategy for aggregation, I probably should add one.



Lastly, in parser I believe there is no support to parse a hint in group by. I 
will probably also need to add one.

> Support hot Groupby keys hint
> -----------------------------
>
>                 Key: CALCITE-3594
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3594
>             Project: Calcite
>          Issue Type: Sub-task
>            Reporter: Rui Wang
>            Assignee: Rui Wang
>            Priority: Major
>
> It will be useful for Apache Beam if we support the following SqlHint:
> SELECT * FROM t
> GROUP BY t.key_column /* + hot_key(key1=fanout_factor, ...) */)
> The hot key strategy works on aggregation and it provides a list of hot keys 
> with fanout factor for a column. The fanout factor says how many partition 
> should be created for that specific key, such that we can have a per 
> partition aggregate and then have a final aggregate. One example to explain 
> it:
> SELECT * FROM t
> GROUP BY t.key_column /* + hot_key("value1"=2) */)
> // for the key_column, there is a "value1" which appear so many times (so 
> it's hot), please consider split it into two partition and process separately.
> Such problem is common for big data processing, where hot key creates slowest 
> machine which either slow down the whole pipeline or make retries. In such 
> case, one common resolution is to split data to multiple partition and 
> aggregate per partition, and then have a final combine. 
> Usually execution engine won't know what is the hot key(s). SqlHint provides 
> a good way to tell the engine which key is useful to deal with it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to