[ https://issues.apache.org/jira/browse/CRUNCH-680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Micah Whitacre resolved CRUNCH-680. ----------------------------------- Resolution: Fixed Fix Version/s: 1.0.0 > Kafka Source should split very large partitions > ----------------------------------------------- > > Key: CRUNCH-680 > URL: https://issues.apache.org/jira/browse/CRUNCH-680 > Project: Crunch > Issue Type: Improvement > Components: IO > Reporter: Andrew Olson > Assignee: Micah Whitacre > Priority: Minor > Fix For: 1.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > If a single Kafka partition has a very large number of messages, the map task > for that partition can take a long time to run leading to potential timeout > problems. We should limit the number of messages assigned to each split so > that the workload is more evenly balanced. > Based on our testing we have determined that 5 million messages should be a > generally reasonable default for the maximum split size, with a configuration > property (org.apache.crunch.kafka.split.max) provided to optionally override > that value. -- This message was sent by Atlassian JIRA (v7.6.3#76005)