[
https://issues.apache.org/jira/browse/FLINK-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14952773#comment-14952773
]
ASF GitHub Bot commented on FLINK-7:
------------------------------------
GitHub user ChengXiangLi opened a pull request:
https://github.com/apache/flink/pull/1255
[FLINK-7] [Runtime] Enable Range Partitioner.
This PR enable range partitioner for Flink follow the path of existing
other partitioners. It depends on the sample operator to random sample data
from `DataSet` and build range boundaries based on sampled data. 2 other hints
about PR:
1. Why execute the sample data job in `JobGraphGenerator` instead of
`PartitionOperator`?
i. launch another job in compile time would lead to infinite job
submission, because the `DataSink`s has not been cleared during compile time.
ii. we need the target stage parallelism to decide sample data size,
and `TypeSerializer`/`TypeComparator` to serialize/sort sampled data.
2. Expand the `DataDistribution` API, previous `DataDistribution` take
`Key[]` as range boundaries, there is not simple generic way to extract Key
from nested object, and `TypeComparator::compareAgainstReference()` is not
supported by current comparators. Use `DataSet` elements as the range
boundaries make everything much easier, we could use
'TypeComparator::compare()' directly for sort during build `DataDistribution`
and selecting channel.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ChengXiangLi/flink rangepartitioner
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/1255.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1255
----
commit 8a41b18c6c40115d545271039e51ebad44300191
Author: chengxiang li <[email protected]>
Date: 2015-10-12T07:13:38Z
[FLINK-7] [Runtime] Enable Range Partitioner.
----
> [GitHub] Enable Range Partitioner
> ---------------------------------
>
> Key: FLINK-7
> URL: https://issues.apache.org/jira/browse/FLINK-7
> Project: Flink
> Issue Type: Sub-task
> Components: Distributed Runtime
> Reporter: GitHub Import
> Assignee: Chengxiang Li
> Fix For: pre-apache
>
>
> The range partitioner is currently disabled. We need to implement the
> following aspects:
> 1) Distribution information, if available, must be propagated back together
> with the ordering property.
> 2) A generic bucket lookup structure (currently specific to PactRecord).
> Tests to re-enable after fixing this issue:
> - TeraSortITCase
> - GlobalSortingITCase
> - GlobalSortingMixedOrderITCase
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/issues/7
> Created by: [StephanEwen|https://github.com/StephanEwen]
> Labels: core, enhancement, optimizer,
> Milestone: Release 0.4
> Assignee: [fhueske|https://github.com/fhueske]
> Created at: Fri Apr 26 13:48:24 CEST 2013
> State: open
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)