[
https://issues.apache.org/jira/browse/CRUNCH-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904607#comment-13904607
]
Gabriel Reid commented on CRUNCH-347:
-------------------------------------
I think that Shard is indeed the best way to take care of something like this.
[~jgmath2000] about the granularity of crunch.max.reducers, PTable#groupByKey
(which triggers a reduce) can take a number of partitions as a parameter, which
allows you to specify how many reducers will be used on that specific reduce.
Does that resolve your issue on the reducer count granularity?
> Allow writing of single file outputs
> ------------------------------------
>
> Key: CRUNCH-347
> URL: https://issues.apache.org/jira/browse/CRUNCH-347
> Project: Crunch
> Issue Type: New Feature
> Components: IO
> Affects Versions: 0.9.0
> Reporter: Jason Gauci
> Priority: Minor
>
> One of the outputs from our system needs to be a single file to support a
> system that is ingesting the data downstream. We currently run the job and
> then cat the output files together to create the final output, but it would
> be nice if we could pass a flag to the write(...) function to handle this
> case.
> Note that setting the number of reducers globally for the entire job doesn't
> work in this case because of the significant performance implications.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)