[
https://issues.apache.org/jira/browse/FLINK-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14626523#comment-14626523
]
ASF GitHub Bot commented on FLINK-1963:
---------------------------------------
Github user fhueske commented on the pull request:
https://github.com/apache/flink/pull/905#issuecomment-121286738
Hi, thanks for the pull request.
I noticed that the `CollectionDataSets.getStringDataSet` data set does not
contain any duplicates. Therefore, it is not well suited to test the distinct
operator.
There also a few more tests for the Distinct operator that need to be
adapted:
- DistinctITCase.java
- DistinctOperatorTest.java
- DistinctOperatorTest.scala
The Unit tests (*Test) check if correct arguments are accepted and invalid
arguments are rejected. It would be nice, if you could try to do a distinct()
on a DataSet<YourObject>, where YourObject is not a Pojo but a generic type
which does not implement `Comparable`, i.e., it is not a Key type.
Thank you, Fabian
> Improve distinct() transformation
> ---------------------------------
>
> Key: FLINK-1963
> URL: https://issues.apache.org/jira/browse/FLINK-1963
> Project: Flink
> Issue Type: Improvement
> Components: Java API, Scala API
> Affects Versions: 0.9
> Reporter: Fabian Hueske
> Assignee: pietro pinoli
> Priority: Minor
> Labels: starter
> Fix For: 0.9
>
>
> The `distinct()` transformation is a bit limited right now with respect to
> processing atomic key types:
> - `distinct(String ...)` works only for composite data types (POJO, tuple),
> but wildcard expression should also be supported for atomic key types
> - `distinct()` only works for composite types, but should also work for
> atomic key types
> - `distinct(KeySelector)` is the most generic one, but not very handy to use
> - `distinct(int ...)` works only for Tuple data types (which is fine)
> Fixing this should be rather easy.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)