[ https://issues.apache.org/jira/browse/CALCITE-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16995369#comment-16995369 ]
Rui Wang edited comment on CALCITE-3594 at 12/14/19 4:38 AM: ------------------------------------------------------------- [~danny0405] first of all, thanks for your detailed written [Hint doc|https://docs.google.com/document/d/1mykz-w2t1Yw7CH6NjUWpWqCAf_6YNKxSc59gXafrNCs/edit#]. I learned a lot from it. {quote}define a hint keyword (We may some reference with other engines to give it a readable name){quote} I think you are saying if there is a better name or a common name used by other engines for "hot_key"? {quote}define the hint options(either a simple identifier list or k-v list), need to make the options as much common to use{quote} This is a tricky one. I see that there is a support for kv list where key is SqlIdentifier and value is just a string literal. In the hot key case, we will need a list of kv pair where key should be a literal. It is because groupby keys are columns that can have different types. So my current thought is adding support of list of kv pair: Literal:Literal, as a hint's body. What do you think? {quote}define the hint strategy for the hint item (see if you need some special conditions for the relational expressions){quote} Agreed. In [HintStrategies|https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/hint/HintStrategies.java#L22], there is no a strategy for aggregation, I probably should add one. Also Aggregrate might should implement Hintable interface as well. Lastly, in parser I believe there is no support to parse a hint in group by. I will probably also need to add one. was (Author: amaliujia): [~danny0405] first of all, thanks for your detailed written [Hint doc|https://docs.google.com/document/d/1mykz-w2t1Yw7CH6NjUWpWqCAf_6YNKxSc59gXafrNCs/edit#]. I learned a lot from it. {quote}define a hint keyword (We may some reference with other engines to give it a readable name){quote} I think you are saying if there is a better name or a common name used by other engines for "hot_key"? {quote}define the hint options(either a simple identifier list or k-v list), need to make the options as much common to use{quote} This is a tricky one. I see that there is a support for kv list where key is SqlIdentifier and value is just a string literal. In the hot key case, we will need a list of kv pair where key should be a literal. It is because groupby keys can have different types so the value should have different type. So my current thought is adding support of list of kv pair: Literal:Literal, as a hint's body. What do you think? {quote}define the hint strategy for the hint item (see if you need some special conditions for the relational expressions){quote} Agreed. In [HintStrategies|https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/hint/HintStrategies.java#L22], there is no a strategy for aggregation, I probably should add one. Lastly, in parser I believe there is no support to parse a hint in group by. I will probably also need to add one. > Support hot Groupby keys hint > ----------------------------- > > Key: CALCITE-3594 > URL: https://issues.apache.org/jira/browse/CALCITE-3594 > Project: Calcite > Issue Type: Sub-task > Reporter: Rui Wang > Assignee: Rui Wang > Priority: Major > > It will be useful for Apache Beam if we support the following SqlHint: > SELECT * FROM t > GROUP BY t.key_column /* + hot_key(key1=fanout_factor, ...) */) > The hot key strategy works on aggregation and it provides a list of hot keys > with fanout factor for a column. The fanout factor says how many partition > should be created for that specific key, such that we can have a per > partition aggregate and then have a final aggregate. One example to explain > it: > SELECT * FROM t > GROUP BY t.key_column /* + hot_key("value1"=2) */) > // for the key_column, there is a "value1" which appear so many times (so > it's hot), please consider split it into two partition and process separately. > Such problem is common for big data processing, where hot key creates slowest > machine which either slow down the whole pipeline or make retries. In such > case, one common resolution is to split data to multiple partition and > aggregate per partition, and then have a final combine. > Usually execution engine won't know what is the hot key(s). SqlHint provides > a good way to tell the engine which key is useful to deal with it. -- This message was sent by Atlassian Jira (v8.3.4#803005)