[ 
https://issues.apache.org/jira/browse/FLINK-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14626523#comment-14626523
 ] 

ASF GitHub Bot commented on FLINK-1963:
---------------------------------------

Github user fhueske commented on the pull request:

    https://github.com/apache/flink/pull/905#issuecomment-121286738
  
    Hi, thanks for the pull request.
    
    I noticed that the `CollectionDataSets.getStringDataSet` data set does not 
contain any duplicates. Therefore, it is not well suited to test the distinct 
operator.
    There also a few more tests for the Distinct operator that need to be 
adapted:
    - DistinctITCase.java
    - DistinctOperatorTest.java
    - DistinctOperatorTest.scala
    
    The Unit tests (*Test) check if correct arguments are accepted and invalid 
arguments are rejected. It would be nice, if you could try to do a distinct() 
on a DataSet<YourObject>, where YourObject is not a Pojo but a generic type 
which does not implement `Comparable`, i.e., it is not a Key type.
    
    Thank you, Fabian


> Improve distinct() transformation
> ---------------------------------
>
>                 Key: FLINK-1963
>                 URL: https://issues.apache.org/jira/browse/FLINK-1963
>             Project: Flink
>          Issue Type: Improvement
>          Components: Java API, Scala API
>    Affects Versions: 0.9
>            Reporter: Fabian Hueske
>            Assignee: pietro pinoli
>            Priority: Minor
>              Labels: starter
>             Fix For: 0.9
>
>
> The `distinct()` transformation is a bit limited right now with respect to 
> processing atomic key types:
> - `distinct(String ...)` works only for composite data types (POJO, tuple), 
> but wildcard expression should also be supported for atomic key types
> - `distinct()` only works for composite types, but should also work for 
> atomic key types
> - `distinct(KeySelector)` is the most generic one, but not very handy to use
> - `distinct(int ...)` works only for Tuple data types (which is fine)
> Fixing this should be rather easy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to