[
https://issues.apache.org/jira/browse/FLINK-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561882#comment-14561882
]
Fabian Hueske commented on FLINK-1963:
--------------------------------------
Sure, assume you have a {{DataSet<Integer>}}, then Integer is an atomic type,
i.e., it is not composed of other types. At the moment it is not possible to
use the {{distinct}} transformation to convert a data set such as
{{\[1,2,2,1,3,5,3\]}} into {{\[1,2,3,5\]}}.
This should be possible in three ways to make it consistent with the remaining
API features:
{code}
DataSet<Integer> myInts = ...
DataSet<Integer> myUniqueInt1 = myInts.distinct();
DataSet<Integer> myUniqueInt2 = myInts.distinct("*"); // "*" is a wildcard
expression (Java style) referring to the full type
DataSet<Integer> myUniqueInt3 = myInts.distinct("_"); // "_" is a wildcard
expression (Scala style) referring to the full type
{code}
This section of the [Flink
documention|http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#specifying-keys]
about specifying keys might be interesting for you.
Let me know if you have further questions.
> Improve distinct() transformation
> ---------------------------------
>
> Key: FLINK-1963
> URL: https://issues.apache.org/jira/browse/FLINK-1963
> Project: Flink
> Issue Type: Improvement
> Components: Java API, Scala API
> Affects Versions: 0.9
> Reporter: Fabian Hueske
> Assignee: pietro pinoli
> Priority: Minor
> Labels: starter
> Fix For: 0.9
>
>
> The `distinct()` transformation is a bit limited right now with respect to
> processing atomic key types:
> - `distinct(String ...)` works only for composite data types (POJO, tuple),
> but wildcard expression should also be supported for atomic key types
> - `distinct()` only works for composite types, but should also work for
> atomic key types
> - `distinct(KeySelector)` is the most generic one, but not very handy to use
> - `distinct(int ...)` works only for Tuple data types (which is fine)
> Fixing this should be rather easy.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)