[
https://issues.apache.org/jira/browse/FLINK-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230460#comment-15230460
]
ASF GitHub Bot commented on FLINK-3665:
---------------------------------------
Github user fhueske commented on a diff in the pull request:
https://github.com/apache/flink/pull/1848#discussion_r58897094
--- Diff:
flink-tests/src/test/java/org/apache/flink/test/javaApiOperators/PartitionITCase.java
---
@@ -546,6 +549,60 @@ public void testRangePartitionInIteration() throws
Exception {
result.collect(); // should fail
}
+ @Test
+ public void testRangePartitionerOnSequenceDataWithOrders() throws
Exception {
+ final ExecutionEnvironment env =
ExecutionEnvironment.getExecutionEnvironment();
+ DataSet<Tuple2<Long, Long>> dataSet = env.generateSequence(0,
10000)
+ .map(new MapFunction<Long, Tuple2<Long,
Long>>() {
+ @Override
+ public Tuple2<Long, Long> map(Long value) throws
Exception {
+ return new Tuple2<>(value / 5000, value % 5000);
+ }
+ });
+
+ final Tuple2Comparator<Long> tuple2Comparator = new
Tuple2Comparator<>(new LongComparator(true),
+
new
LongComparator(false));
+
+ MinMaxSelector<Tuple2<Long, Long>> minMaxSelector = new
MinMaxSelector<>(tuple2Comparator);
+
+ final List<Tuple2<Tuple2<Long, Long>, Tuple2<Long, Long>>>
collected = dataSet.partitionByRange(0, 1)
+ .withOrders(Order.ASCENDING, Order.DESCENDING)
+ .mapPartition(minMaxSelector)
+ .collect();
+
+ Collections.sort(collected, new
Tuple2Comparator<>(tuple2Comparator));
+
+ Tuple2<Long, Long> previousMax = null;
+ for (Tuple2<Tuple2<Long, Long>, Tuple2<Long, Long>> tuple2 :
collected) {
+ if (previousMax == null) {
+ assertTrue(tuple2Comparator.compare(tuple2.f0,
tuple2.f1) < 0);
+ previousMax = tuple2.f1;
+ } else {
+ assertTrue(tuple2Comparator.compare(tuple2.f0,
tuple2.f1) < 0);
+ if (previousMax.f0.equals(tuple2.f0.f0)) {
+ assertEquals(previousMax.f1 - 1,
tuple2.f0.f1.longValue());
+ }
+ previousMax = tuple2.f1;
+ }
+ }
+ }
+
+ @Test(expected = IllegalStateException.class)
+ public void testHashPartitionWithOrders() throws Exception {
--- End diff --
Please move this test method and the `testRebalanceWithOrders()` method
into a new test class
`org.apache.flink.api.java.operator.PartitionOperatorTest` in the `flink-java`
module. There are more tests for operators like `GroupingTest` that validate
correct behavior of the API without actually executing programs. In fact, there
should have been already a test class for the partition operator but for some
reason these tests are missing.
It would be nice if you could add more tests to the PartitionOperatorTest,
including
- test valid and invalid number of orders for flat and nested keys
- test valid and invalid key definitions for range and hash partitioning
- test valid and invalid custom partitioners for range and hash partitioning
- other corner cases that come to your mind.
> Range partitioning lacks support to define sort orders
> ------------------------------------------------------
>
> Key: FLINK-3665
> URL: https://issues.apache.org/jira/browse/FLINK-3665
> Project: Flink
> Issue Type: Improvement
> Components: DataSet API
> Affects Versions: 1.0.0
> Reporter: Fabian Hueske
> Fix For: 1.1.0
>
>
> {{DataSet.partitionByRange()}} does not allow to specify the sort order of
> fields. This is fine if range partitioning is used to reduce skewed
> partitioning.
> However, it is not sufficient if range partitioning is used to sort a data
> set in parallel.
> Since {{DataSet.partitionByRange()}} is {{@Public}} API and cannot be easily
> changed, I propose to add a method {{withOrders(Order... orders)}} to
> {{PartitionOperator}}. The method should throw an exception if the
> partitioning method of {{PartitionOperator}} is not range partitioning.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)