[ 
https://issues.apache.org/jira/browse/FLINK-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066420#comment-15066420
 ] 

ASF GitHub Bot commented on FLINK-7:
------------------------------------

Github user ChengXiangLi commented on the pull request:

    https://github.com/apache/flink/pull/1255#issuecomment-166297616
  
    Sorry, @fhueske , i misunderstood your test data, the keys should be skewed 
on some value, while in my previous test, the keys are now skewed. it's 
complicate to calculate how many samples should be taken from a dataset to meet 
an a priori specified accuracy guarantee, one of the algorithm is described at 
http://research.microsoft.com/pubs/159275/MSR-TR-2012-18.pdf which i used 
before, but it should not totally fit into the case which keys are skewed.
    Would you continue to test how much it required to make partition roughly 
balanced? Raise the sample number should not add much overhead, i'm totally 
support of it.


> [GitHub] Enable Range Partitioner
> ---------------------------------
>
>                 Key: FLINK-7
>                 URL: https://issues.apache.org/jira/browse/FLINK-7
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Distributed Runtime
>            Reporter: GitHub Import
>            Assignee: Chengxiang Li
>             Fix For: pre-apache
>
>
> The range partitioner is currently disabled. We need to implement the 
> following aspects:
> 1) Distribution information, if available, must be propagated back together 
> with the ordering property.
> 2) A generic bucket lookup structure (currently specific to PactRecord).
> Tests to re-enable after fixing this issue:
>  - TeraSortITCase
>  - GlobalSortingITCase
>  - GlobalSortingMixedOrderITCase
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/issues/7
> Created by: [StephanEwen|https://github.com/StephanEwen]
> Labels: core, enhancement, optimizer, 
> Milestone: Release 0.4
> Assignee: [fhueske|https://github.com/fhueske]
> Created at: Fri Apr 26 13:48:24 CEST 2013
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to