Interesting read, thanks for the link! On Thu, 19 May 2016 at 07:09 Dan Halperin <[email protected]> wrote:
> Hey folks, > > This morning, my colleagues Eugene & Malo posted *No shard left behind: > dynamic work rebalancing in Google Cloud Dataflow > < > https://cloud.google.com/blog/big-data/2016/05/no-shard-left-behind-dynamic-work-rebalancing-in-google-cloud-dataflow > >*. > This article discusses Cloud Dataflow’s solution to the well-known > straggler problem. > > In a large batch processing job with many tasks executing in parallel, some > of the tasks – the stragglers – can take a much longer time to complete > than others, perhaps due to imperfect splitting of the work into parallel > chunks when issuing the job. Typically, waiting for stragglers means that > the overall job completes later than it should, and may also reserve too > many machines that may be underutilized at the end. Cloud Dataflow’s > dynamic work rebalancing can mitigate stragglers in most cases. > > What I’d like to highlight for the Apache Beam (incubating) community is > that Cloud Dataflow’s dynamic work rebalancing is implemented using > *runner-specific* control logic on top of Beam’s *runner-independent* > BoundedSource > API > < > https://github.com/apache/incubator-beam/blob/9fa97fb2491bc784df53fb0f044409dbbc2af3d7/sdks/java/core/src/main/java/org/apache/beam/sdk/io/BoundedSource.java > >. > Specifically, to steal work from a straggler, a runner need only call the > reader’s splitAtFraction method. This will generate a new source containing > leftover work, and then the runner can pass that source off to another idle > worker. As Beam matures, I hope that other runners are interested in > figuring out whether these APIs can help them improve performance, > implementing dynamic work rebalancing, and collaborating on API changes > that will help solve other pain points. > > Dan > > (Also posted on Beam blog: > > http://beam.incubator.apache.org/blog/2016/05/18/splitAtFraction-method.html > ) >
