[ 
https://issues.apache.org/jira/browse/CRUNCH-673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated CRUNCH-673:
--------------------------------
    Attachment: CRUNCH-673.patch

> Sort fails when using more reducers than records
> ------------------------------------------------
>
>                 Key: CRUNCH-673
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-673
>             Project: Crunch
>          Issue Type: Bug
>            Reporter: Gabriel Reid
>            Priority: Minor
>         Attachments: CRUNCH-673.patch
>
>
> We've run into an issue where running Sort with a number of reducers that is 
> higher than the number of records to be sorted fails.
> The way in which this occurs is that a large PCollection is filtered down to 
> almost nothing (say 10 records), and that filtered PCollection is passed in 
> to Sort. Sort configures n reducers for the small PCollection (because it 
> doesn't realize that it has been filtered so aggressively), so then there are 
> for example 20 reducers configured. Reservoir sampling is used to build up 
> the partition definitions for the TotalOrderPartitioner, but because there 
> are only 10 records in the filtered PCollection, only 10 partitions are 
> defined for the TotalOrderPartitioner. This then causes a precondition in 
> TotalOrderPartitioner to fail, because the number of partitions in the 
> partitions file doesn't match up with the number of configured reducers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to