[
https://issues.apache.org/jira/browse/FLINK-7234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16096403#comment-16096403
]
ASF GitHub Bot commented on FLINK-7234:
---------------------------------------
Github user fhueske commented on the issue:
https://github.com/apache/flink/pull/4372
I think you are right @greghogan.
It's not about the ratio of #distinct keys to size of the dataset. But it's
also not only the ratio of #distinct keys to size of the memory. The skew of
the key distribution has an effect as well (hash-based combiners should better
handle skew than sort-based combiners).
> Fix CombineHint documentation
> -----------------------------
>
> Key: FLINK-7234
> URL: https://issues.apache.org/jira/browse/FLINK-7234
> Project: Flink
> Issue Type: Bug
> Components: Documentation
> Affects Versions: 1.2.2, 1.4.0, 1.3.2
> Reporter: Greg Hogan
> Assignee: Greg Hogan
>
> The {{CombineHint}}
> [documentation|https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/batch/index.html]
> applies to {{DataSet#reduce}} not {{DataSet#reduceGroup}} and should also be
> note for {{DataSet#distinct}}. It is also set with
> {{.setCombineHint(CombineHint)}} rather than alongside the UDF parameter.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)