[
https://issues.apache.org/jira/browse/FLINK-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627150#comment-14627150
]
ASF GitHub Bot commented on FLINK-1963:
---------------------------------------
Github user fhueske commented on a diff in the pull request:
https://github.com/apache/flink/pull/905#discussion_r34627102
--- Diff:
flink-tests/src/test/java/org/apache/flink/test/javaApiOperators/DistinctITCase.java
---
@@ -274,4 +277,81 @@ public Integer map(POJO value) throws Exception {
return (int) value.nestedPojo.longNumber;
}
}
+
+ @Test
+ public void testCorrectnessOfDistinctOnAtomic() throws Exception {
+ /*
+ * check correctness of distinct on Integers
+ */
+
+ final ExecutionEnvironment env =
ExecutionEnvironment.getExecutionEnvironment();
+ DataSet<Integer> ds = CollectionDataSets.getIntegerDataSet(env);
+ DataSet<Integer> reduceDs = ds.distinct();
+
+ List<Integer> result = reduceDs.collect();
+
+ String expected = "1\n2\n3\n4\n5";
+
+ compareResultAsText(result, expected);
+ }
+
+ @Test
+ public void testCorrectnessOfDistinctOnAtomicWithSelectAllChar() throws
Exception {
+ /*
+ * check correctness of distinct on Strings, using
Keys.ExpressionKeys.SELECT_ALL_CHAR
+ */
+
+ final ExecutionEnvironment env =
ExecutionEnvironment.getExecutionEnvironment();
+ DataSet<String> ds = CollectionDataSets.getStringDataSet(env);
+ DataSet<String> reduceDs = ds.union(ds).distinct("*");
+
+ List<String> result = reduceDs.collect();
+
+ String expected = "I am fine.\n" +
+ "Luke Skywalker\n" +
+ "LOL\n" +
+ "Hello world, how are you?\n" +
+ "Hi\n" +
+ "Hello world\n" +
+ "Hello\n" +
+ "Random comment\n";
+
+ compareResultAsText(result, expected);
+ }
+
+ @Test(expected = InvalidProgramException.class)
+ public void testDistinctOnNotKeyDataType() throws Exception {
--- End diff --
We usually add such failing tests to the unit tests
(DistinctOperatorTest.java) instead of integration tests.
> Improve distinct() transformation
> ---------------------------------
>
> Key: FLINK-1963
> URL: https://issues.apache.org/jira/browse/FLINK-1963
> Project: Flink
> Issue Type: Improvement
> Components: Java API, Scala API
> Affects Versions: 0.9
> Reporter: Fabian Hueske
> Assignee: pietro pinoli
> Priority: Minor
> Labels: starter
> Fix For: 0.9
>
>
> The `distinct()` transformation is a bit limited right now with respect to
> processing atomic key types:
> - `distinct(String ...)` works only for composite data types (POJO, tuple),
> but wildcard expression should also be supported for atomic key types
> - `distinct()` only works for composite types, but should also work for
> atomic key types
> - `distinct(KeySelector)` is the most generic one, but not very handy to use
> - `distinct(int ...)` works only for Tuple data types (which is fine)
> Fixing this should be rather easy.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)