[jira] [Commented] (FLINK-3665) Range partitioning lacks support to define sort orders

ASF GitHub Bot (JIRA) Thu, 07 Apr 2016 08:58:39 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230460#comment-15230460
 ]


ASF GitHub Bot commented on FLINK-3665:
---------------------------------------

Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1848#discussion_r58897094
  
    --- Diff: 
flink-tests/src/test/java/org/apache/flink/test/javaApiOperators/PartitionITCase.java
 ---
    @@ -546,6 +549,60 @@ public void testRangePartitionInIteration() throws 
Exception {
                result.collect(); // should fail
        }
     
    +   @Test
    +   public void testRangePartitionerOnSequenceDataWithOrders() throws 
Exception {
    +           final ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment();
    +           DataSet<Tuple2<Long, Long>> dataSet = env.generateSequence(0, 
10000)
    +                           .map(new MapFunction<Long, Tuple2<Long, 
Long>>() {
    +                   @Override
    +                   public Tuple2<Long, Long> map(Long value) throws 
Exception {
    +                           return new Tuple2<>(value / 5000, value % 5000);
    +                   }
    +           });
    +
    +           final Tuple2Comparator<Long> tuple2Comparator = new 
Tuple2Comparator<>(new LongComparator(true),
    +                                                                           
                                                                           new 
LongComparator(false));
    +
    +           MinMaxSelector<Tuple2<Long, Long>> minMaxSelector = new 
MinMaxSelector<>(tuple2Comparator);
    +
    +           final List<Tuple2<Tuple2<Long, Long>, Tuple2<Long, Long>>> 
collected = dataSet.partitionByRange(0, 1)
    +                           .withOrders(Order.ASCENDING, Order.DESCENDING)
    +                           .mapPartition(minMaxSelector)
    +                           .collect();
    +
    +           Collections.sort(collected, new 
Tuple2Comparator<>(tuple2Comparator));
    +
    +           Tuple2<Long, Long> previousMax = null;
    +           for (Tuple2<Tuple2<Long, Long>, Tuple2<Long, Long>> tuple2 : 
collected) {
    +                   if (previousMax == null) {
    +                           assertTrue(tuple2Comparator.compare(tuple2.f0, 
tuple2.f1) < 0);
    +                           previousMax = tuple2.f1;
    +                   } else {
    +                           assertTrue(tuple2Comparator.compare(tuple2.f0, 
tuple2.f1) < 0);
    +                           if (previousMax.f0.equals(tuple2.f0.f0)) {
    +                                   assertEquals(previousMax.f1 - 1, 
tuple2.f0.f1.longValue());
    +                           }
    +                           previousMax = tuple2.f1;
    +                   }
    +           }
    +   }
    +
    +   @Test(expected = IllegalStateException.class)
    +   public void testHashPartitionWithOrders() throws Exception {
    --- End diff --
    
    Please move this test method and the `testRebalanceWithOrders()` method 
into a new test class 
`org.apache.flink.api.java.operator.PartitionOperatorTest` in the `flink-java` 
module. There are more tests for operators like `GroupingTest` that validate 
correct behavior of the API without actually executing programs. In fact, there 
should have been already a test class for the partition operator but for some 
reason these tests are missing.
    
    It would be nice if you could add more tests to the PartitionOperatorTest, 
including
    - test valid and invalid number of orders for flat and nested keys
    - test valid and invalid key definitions for range and hash partitioning
    - test valid and invalid custom partitioners for range and hash partitioning
    - other corner cases that come to your mind.


> Range partitioning lacks support to define sort orders
> ------------------------------------------------------
>
>                 Key: FLINK-3665
>                 URL: https://issues.apache.org/jira/browse/FLINK-3665
>             Project: Flink
>          Issue Type: Improvement
>          Components: DataSet API
>    Affects Versions: 1.0.0
>            Reporter: Fabian Hueske
>             Fix For: 1.1.0
>
>
> {{DataSet.partitionByRange()}} does not allow to specify the sort order of 
> fields. This is fine if range partitioning is used to reduce skewed 
> partitioning. 
> However, it is not sufficient if range partitioning is used to sort a data 
> set in parallel. 
> Since {{DataSet.partitionByRange()}} is {{@Public}} API and cannot be easily 
> changed, I propose to add a method {{withOrders(Order... orders)}} to 
> {{PartitionOperator}}. The method should throw an exception if the 
> partitioning method of {{PartitionOperator}} is not range partitioning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3665) Range partitioning lacks support to define sort orders

Reply via email to