[ https://issues.apache.org/jira/browse/CALCITE-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16994046#comment-16994046 ]
Danny Chen commented on CALCITE-3594: ------------------------------------- Thanks for firing this issue [~amaliujia] ~, currently, there is no any hints item implementations in Calcite, i'm glad that we now began to add some builtin items To add one, we may: - define a hint keyword (We may some reference with other engines to give it a readable name) - define the hint options(either a simple identifier list or k-v list), need to make the options as much common to use - define the hint strategy for the hint item (see if you need some special conditions for the relational expressions) > Support hot Groupby keys hint > ----------------------------- > > Key: CALCITE-3594 > URL: https://issues.apache.org/jira/browse/CALCITE-3594 > Project: Calcite > Issue Type: Sub-task > Reporter: Rui Wang > Assignee: Rui Wang > Priority: Major > > It will be useful for Apache Beam if we support the following SqlHint: > SELECT * FROM t > GROUP BY t.key_column /* + hot_key(key1=fanout_factor, ...) */) > The hot key strategy works on aggregation and it provides a list of hot keys > with fanout factor for a column. The fanout factor says how many partition > should be created for that specific key, such that we can have a per > partition aggregate and then have a final aggregate. One example to explain > it: > SELECT * FROM t > GROUP BY t.key_column /* + hot_key("value1"=2) */) > // for the key_column, there is a "value1" which appear so many times (so > it's hot), please consider split it into two partition and process separately. > Such problem is common for big data processing, where hot key creates slowest > machine which either slow down the whole pipeline or make retries. In such > case, one common resolution is to split data to multiple partition and > aggregate per partition, and then have a final combine. > Usually execution engine won't know what is the hot key(s). SqlHint provides > a good way to tell the engine which key is useful to deal with it. -- This message was sent by Atlassian Jira (v8.3.4#803005)