[ https://issues.apache.org/jira/browse/CRUNCH-642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381737#comment-16381737 ]
Gabriel Reid commented on CRUNCH-642: ------------------------------------- Fix version was incorrectly set to 0.15 (this patch was made after that release), updated to release version 1.0. > Enable numReducers option for methods in Distinct > ------------------------------------------------- > > Key: CRUNCH-642 > URL: https://issues.apache.org/jira/browse/CRUNCH-642 > Project: Crunch > Issue Type: Improvement > Components: Core > Affects Versions: 0.14.0 > Reporter: Xavier > Assignee: Josh Wills > Priority: Trivial > Fix For: 1.0.0 > > Attachments: > 0001-CRUNCH-642-Enable-GroupingOptions-for-Distinct-opera.patch, > CRUNCH-642-Enable-GroupingOptions-for-Distinct-operations.patch, > CRUNCH-642.patch > > > The {{groupByKey}} invocation in the {{Distinct}} class currently uses the > default (recommended) number of reducers without providing an option to > override this: > {code} > public static <S> PCollection<S> distinct(PCollection<S> input, int > flushEvery) { > Preconditions.checkArgument(flushEvery > 0); > PType<S> pt = input.getPType(); > PTypeFamily ptf = pt.getFamily(); > return input > .parallelDo("pre-distinct", new PreDistinctFn<S>(flushEvery, pt), > ptf.tableOf(pt, ptf.nulls())) > .groupByKey() > .parallelDo("post-distinct", new PostDistinctFn<S>(), pt); > } > {code} > Would it be possible to enhance this method such that it is possible to > customize the number of reducers? Either explicitly or via a > {{GroupingOptions}} object. -- This message was sent by Atlassian JIRA (v7.6.3#76005)