[
https://issues.apache.org/jira/browse/FLINK-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063322#comment-15063322
]
ASF GitHub Bot commented on FLINK-7:
------------------------------------
Github user ChengXiangLi commented on the pull request:
https://github.com/apache/flink/pull/1255#issuecomment-165652344
Hi, @fhueske , For the partition part, i think it's normal that
`RangePartition` is slower than `HashParition`, as you've mentioned,
`RangePartition` introduce more overhead. The most difference between
`HashParition` and `RangePartition` is that, `HashParition` is key-wise
partition(elements with same key would shuffled to same target), and
`RangePartition` is key-wise and partition-wise partition(the partition is in
order as well), so for global order, we can sort in parallel after
`RangePartition`, that's what we can benefit from `RangePartition`.
On the other side, it's still make sense to improve `RangePartition`
performance, although i don't think increasing the sample size would help here.
Based on my previous calculation and test, `parallelism * 20` is enough to
generate well-proportioned partitions. Do you find there is data skew in any
partition after `RangePartition`?
> [GitHub] Enable Range Partitioner
> ---------------------------------
>
> Key: FLINK-7
> URL: https://issues.apache.org/jira/browse/FLINK-7
> Project: Flink
> Issue Type: Sub-task
> Components: Distributed Runtime
> Reporter: GitHub Import
> Assignee: Chengxiang Li
> Fix For: pre-apache
>
>
> The range partitioner is currently disabled. We need to implement the
> following aspects:
> 1) Distribution information, if available, must be propagated back together
> with the ordering property.
> 2) A generic bucket lookup structure (currently specific to PactRecord).
> Tests to re-enable after fixing this issue:
> - TeraSortITCase
> - GlobalSortingITCase
> - GlobalSortingMixedOrderITCase
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/issues/7
> Created by: [StephanEwen|https://github.com/StephanEwen]
> Labels: core, enhancement, optimizer,
> Milestone: Release 0.4
> Assignee: [fhueske|https://github.com/fhueske]
> Created at: Fri Apr 26 13:48:24 CEST 2013
> State: open
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)