[ https://issues.apache.org/jira/browse/CRUNCH-673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gabriel Reid updated CRUNCH-673: -------------------------------- Attachment: CRUNCH-673.patch > Sort fails when using more reducers than records > ------------------------------------------------ > > Key: CRUNCH-673 > URL: https://issues.apache.org/jira/browse/CRUNCH-673 > Project: Crunch > Issue Type: Bug > Reporter: Gabriel Reid > Priority: Minor > Attachments: CRUNCH-673.patch > > > We've run into an issue where running Sort with a number of reducers that is > higher than the number of records to be sorted fails. > The way in which this occurs is that a large PCollection is filtered down to > almost nothing (say 10 records), and that filtered PCollection is passed in > to Sort. Sort configures n reducers for the small PCollection (because it > doesn't realize that it has been filtered so aggressively), so then there are > for example 20 reducers configured. Reservoir sampling is used to build up > the partition definitions for the TotalOrderPartitioner, but because there > are only 10 records in the filtered PCollection, only 10 partitions are > defined for the TotalOrderPartitioner. This then causes a precondition in > TotalOrderPartitioner to fail, because the number of partitions in the > partitions file doesn't match up with the number of configured reducers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)