[jira] [Commented] (FLINK-2946) Add orderBy() to Table API

Dawid Wysakowicz (JIRA) Fri, 01 Apr 2016 04:29:37 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15221567#comment-15221567
 ]


Dawid Wysakowicz commented on FLINK-2946:
-----------------------------------------

I still have some problems with range partitioning and parallelism. 

* First of all the {{org.apache.flink.api.java.DataSet}} that I get from 
{{translateToPlan}} does not have the method getParallelism. But that's a minor 
issue.
* I am not sure how to extract the eventual parallelism of the input and if I 
need to do this. Let's take this as example:

{code}
    val env = ExecutionEnvironment.getExecutionEnvironment
    env.setParallelism(1)

    val t = env.fromElements((1, 3, "Third"), (1, 2, "Fourth"), (1, 4, 
"Second"),
      (2, 1, "Sixth"), (1, 5, "First"), (1, 1, "Fifth")).setParallelism(4)
      .toTable.orderBy('_1.asc, '_2.desc)
{code}

The dataset then looks like(the numbers in brackets is parallelism of 
operator): DataSource(4) -> MapOperator(-1) -> here I must apply either 
SortOperator or PartitionOperator -> SortOperator.

On what parallelism shall I decide if the PartitionOperator should be applied? 
What should be the parallelism of PartitionOperator?(By default it is the one 
from ExecutionEnvironment)

Hope I stated my problems clearly.

> Add orderBy() to Table API
> --------------------------
>
>                 Key: FLINK-2946
>                 URL: https://issues.apache.org/jira/browse/FLINK-2946
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API
>            Reporter: Timo Walther
>            Assignee: Dawid Wysakowicz
>
> In order to implement a FLINK-2099 prototype that uses the Table APIs code 
> generation facilities, the Table API needs a sorting feature.
> I would implement it the next days. Ideas how to implement such a sorting 
> feature are very welcome. Is there any more efficient way instead of 
> {{.sortPartition(...).setParallism(1)}}? Is it better to sort locally on the 
> nodes first and finally sort on one node afterwards?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2946) Add orderBy() to Table API

Reply via email to