[
https://issues.apache.org/jira/browse/FLINK-7234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16096381#comment-16096381
]
ASF GitHub Bot commented on FLINK-7234:
---------------------------------------
Github user greghogan commented on the issue:
https://github.com/apache/flink/pull/4372
@StephanEwen I like the new template. I much prefer free form over
checkboxes.
@fhueske I'm questioning my understanding of the the heuristic for using a
hash-combine. For a fixed number of keys the hash-combine can be beneficial
independent of the size of the data set. Basing the decision on the ratio of
keys to values, as the size of the data set increases the likelihood of
matching keys and values occurring in the same combine operation (before
filling and being flushed to the reducer) decreases.
This is often the case for graphs. I'm thinking that the improvement for
using hash-combine on larger data sets may have been due to hashing performing
better than sort when we wanted to disable the combiner.
> Fix CombineHint documentation
> -----------------------------
>
> Key: FLINK-7234
> URL: https://issues.apache.org/jira/browse/FLINK-7234
> Project: Flink
> Issue Type: Bug
> Components: Documentation
> Affects Versions: 1.2.2, 1.4.0, 1.3.2
> Reporter: Greg Hogan
> Assignee: Greg Hogan
>
> The {{CombineHint}}
> [documentation|https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/batch/index.html]
> applies to {{DataSet#reduce}} not {{DataSet#reduceGroup}} and should also be
> note for {{DataSet#distinct}}. It is also set with
> {{.setCombineHint(CombineHint)}} rather than alongside the UDF parameter.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)