[GitHub] flink pull request: [FLINK-1963] Improve distinct() transformation

chiwanpark Mon, 13 Jul 2015 05:38:36 -0700

Github user chiwanpark commented on the pull request:

    https://github.com/apache/flink/pull/905#issuecomment-120913343
  
    Hi, @pp86 Thanks for your contribution.
    
    But I think that using `AutoSelector` is not the best approach to improve 
distinct transformation. In Flink, a `KeySelector` converts a `DataSet<O>` to 
`DataSet<Tuple2<K, O>>` and uses the first element of the tuple as key. For 
atomic types, `AutoSelector` creates `DataSet<Tuple2<V, V>>` which 
unnecessarily duplicated data.
    
    I recommend `Keys.ExpressionKeys` when the user call `distinct()` method on 
atomic data types.
    
    And It would be better to add the test cases for this changes.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1963] Improve distinct() transformation

Reply via email to