[ 
https://issues.apache.org/jira/browse/FLINK-7234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16096381#comment-16096381
 ] 

ASF GitHub Bot commented on FLINK-7234:
---------------------------------------

Github user greghogan commented on the issue:

    https://github.com/apache/flink/pull/4372
  
    @StephanEwen I like the new template. I much prefer free form over 
checkboxes.
    
    @fhueske I'm questioning my understanding of the the heuristic for using a 
hash-combine. For a fixed number of keys the hash-combine can be beneficial 
independent of the size of the data set. Basing the decision on the ratio of 
keys to values, as the size of the data set increases the likelihood of 
matching keys and values occurring in the same combine operation (before 
filling and being flushed to the reducer) decreases.
    
    This is often the case for graphs. I'm thinking that the improvement for 
using hash-combine on larger data sets may have been due to hashing performing 
better than sort when we wanted to disable the combiner.


> Fix CombineHint documentation
> -----------------------------
>
>                 Key: FLINK-7234
>                 URL: https://issues.apache.org/jira/browse/FLINK-7234
>             Project: Flink
>          Issue Type: Bug
>          Components: Documentation
>    Affects Versions: 1.2.2, 1.4.0, 1.3.2
>            Reporter: Greg Hogan
>            Assignee: Greg Hogan
>
> The {{CombineHint}} 
> [documentation|https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/batch/index.html]
>  applies to {{DataSet#reduce}} not {{DataSet#reduceGroup}} and should also be 
> note for {{DataSet#distinct}}. It is also set with 
> {{.setCombineHint(CombineHint)}} rather than alongside the UDF parameter.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to