[
https://issues.apache.org/jira/browse/FLINK-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066420#comment-15066420
]
ASF GitHub Bot commented on FLINK-7:
------------------------------------
Github user ChengXiangLi commented on the pull request:
https://github.com/apache/flink/pull/1255#issuecomment-166297616
Sorry, @fhueske , i misunderstood your test data, the keys should be skewed
on some value, while in my previous test, the keys are now skewed. it's
complicate to calculate how many samples should be taken from a dataset to meet
an a priori specified accuracy guarantee, one of the algorithm is described at
http://research.microsoft.com/pubs/159275/MSR-TR-2012-18.pdf which i used
before, but it should not totally fit into the case which keys are skewed.
Would you continue to test how much it required to make partition roughly
balanced? Raise the sample number should not add much overhead, i'm totally
support of it.
> [GitHub] Enable Range Partitioner
> ---------------------------------
>
> Key: FLINK-7
> URL: https://issues.apache.org/jira/browse/FLINK-7
> Project: Flink
> Issue Type: Sub-task
> Components: Distributed Runtime
> Reporter: GitHub Import
> Assignee: Chengxiang Li
> Fix For: pre-apache
>
>
> The range partitioner is currently disabled. We need to implement the
> following aspects:
> 1) Distribution information, if available, must be propagated back together
> with the ordering property.
> 2) A generic bucket lookup structure (currently specific to PactRecord).
> Tests to re-enable after fixing this issue:
> - TeraSortITCase
> - GlobalSortingITCase
> - GlobalSortingMixedOrderITCase
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/issues/7
> Created by: [StephanEwen|https://github.com/StephanEwen]
> Labels: core, enhancement, optimizer,
> Milestone: Release 0.4
> Assignee: [fhueske|https://github.com/fhueske]
> Created at: Fri Apr 26 13:48:24 CEST 2013
> State: open
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)