Sorry all, I messed up the link/reference numbers. Here is the same e-mail
with the reference numbers fixed.

I build off of the work performed by Eugene et al. within Breaking the
fusion barrier[2] and propose[1] a way of how to support splitting of
bundles (primarily for SplittableDoFn) within the portability layer. This
also builds off of a lot of past work[3, 4, 5, 6, 7] related to splitting.

Note that this proposal[1] discusses the portability API changes and
"control" flow needed. It also discusses implementation details recommended
during implementation by SDKs and runners. Interestingly, I believe there
is a way to have a limited form of dynamic work rebalancing for all
runners[8] that exist today that should be easily extensible by Runners to
provide a meaningful solution but until implemented and tried out, hard to
say what gains if any there could be.

Note that follow-up proposals/discussions about any SplittableDoFn API
changes specific to each language implementation should follow by those
interested in getting SplittableDoFn working with portability. There are a
few that are needed to support backlog reporting/splitting at backlog[7]
and also bundle finalization[9].

This topic has a lot of historical context so I apologize upfront for the
complicated read, but feel free to comment on the doc or this thread.

1: https://s.apache.org/beam-checkpoint-and-split-bundles
2: https://s.apache.org/splittable-do-fn
3: https://beam.apache.org/blog/2017/08/16/splittable-do-fn.html
4: http://s.apache.org/beam-breaking-fusion
5: http://s.apache.org/textio-sdf
6: http://s.apache.org/splittable-do-fn-python-sdk
7: https://s.apache.org/beam-bundles-backlog-splitting
8:
https://docs.google.com/document/d/1cKOB9ToasfYs1kLWQgffzvIbJx2Smy4svlodPRhFrk4/edit#heading=h.wkwslng744mv
9: https://s.apache.org/beam-finalizing-bundles

On Fri, Oct 26, 2018 at 3:07 PM Lukasz Cwik <[email protected]> wrote:

> I build off of the work performed by Eugene et al. within Breaking the
> fusion barrier[2] and propose[1] a way of how to support splitting of
> bundles (primarily for SplittableDoFn) within the portability layer. This
> also builds off of a lot of past work[3, 4, 5, 6, 7] related to splitting.
>
> Note that this proposal[1] discusses the portability API changes and
> "control" flow needed. It also discusses implementation details recommended
> during implementation by SDKs and runners. Interestingly, I believe there
> is a way to have a limited form of dynamic work rebalancing for all
> runners[8] that exist today that should be easily extensible by Runners to
> provide a meaningful solution but until implemented and tried out, hard to
> say what gains if any there could be.
>
> Note that follow-up proposals/discussions about any SplittableDoFn API
> changes specific to each language implementation should follow by those
> interested in getting SplittableDoFn working with portability. There are a
> few that are needed to support backlog reporting/splitting at backlog[6]
> and also bundle finalization[9].
>
> This topic has a lot of historical context so I apologize upfront for the
> complicated read, but feel free to comment on the doc or this thread.
>
> 1: https://s.apache.org/splittable-do-fn
> 2: https://beam.apache.org/blog/2017/08/16/splittable-do-fn.html
> 3: http://s.apache.org/beam-breaking-fusion
> 4: http://s.apache.org/textio-sdf
> 5: http://s.apache.org/splittable-do-fn-python-sdk
> 6: https://s.apache.org/beam-bundles-backlog-splitting
> 7: https://s.apache.org/beam-checkpoint-and-split-bundles
> 8:
> https://docs.google.com/document/d/1cKOB9ToasfYs1kLWQgffzvIbJx2Smy4svlodPRhFrk4/edit#heading=h.wkwslng744mv
> 9: https://s.apache.org/beam-finalizing-bundles
>

Reply via email to