[ 
https://issues.apache.org/jira/browse/CRUNCH-642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381737#comment-16381737
 ] 

Gabriel Reid commented on CRUNCH-642:
-------------------------------------

Fix version was incorrectly set to 0.15 (this patch was made after that 
release), updated to release version 1.0.

> Enable numReducers option for methods in Distinct
> -------------------------------------------------
>
>                 Key: CRUNCH-642
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-642
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.14.0
>            Reporter: Xavier
>            Assignee: Josh Wills
>            Priority: Trivial
>             Fix For: 1.0.0
>
>         Attachments: 
> 0001-CRUNCH-642-Enable-GroupingOptions-for-Distinct-opera.patch, 
> CRUNCH-642-Enable-GroupingOptions-for-Distinct-operations.patch, 
> CRUNCH-642.patch
>
>
> The {{groupByKey}} invocation in the {{Distinct}} class currently uses the 
> default  (recommended) number of reducers without providing an option to 
> override this:
> {code}
> public static <S> PCollection<S> distinct(PCollection<S> input, int 
> flushEvery) {
>   Preconditions.checkArgument(flushEvery > 0);
>   PType<S> pt = input.getPType();
>   PTypeFamily ptf = pt.getFamily();
>   return input
>       .parallelDo("pre-distinct", new PreDistinctFn<S>(flushEvery, pt), 
> ptf.tableOf(pt, ptf.nulls()))
>       .groupByKey()
>       .parallelDo("post-distinct", new PostDistinctFn<S>(), pt);
> }
> {code}
> Would it be possible to enhance this method such that it is possible to 
> customize the number of reducers? Either explicitly or via a 
> {{GroupingOptions}} object.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to