[ 
https://issues.apache.org/jira/browse/FLINK-7234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16096403#comment-16096403
 ] 

ASF GitHub Bot commented on FLINK-7234:
---------------------------------------

Github user fhueske commented on the issue:

    https://github.com/apache/flink/pull/4372
  
    I think you are right @greghogan. 
    
    It's not about the ratio of #distinct keys to size of the dataset. But it's 
also not only the ratio of #distinct keys to size of the memory. The skew of 
the key distribution has an effect as well (hash-based combiners should better 
handle skew than sort-based combiners).


> Fix CombineHint documentation
> -----------------------------
>
>                 Key: FLINK-7234
>                 URL: https://issues.apache.org/jira/browse/FLINK-7234
>             Project: Flink
>          Issue Type: Bug
>          Components: Documentation
>    Affects Versions: 1.2.2, 1.4.0, 1.3.2
>            Reporter: Greg Hogan
>            Assignee: Greg Hogan
>
> The {{CombineHint}} 
> [documentation|https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/batch/index.html]
>  applies to {{DataSet#reduce}} not {{DataSet#reduceGroup}} and should also be 
> note for {{DataSet#distinct}}. It is also set with 
> {{.setCombineHint(CombineHint)}} rather than alongside the UDF parameter.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to