I wanted to reach out that I will be continuing from where Eugene left off
with SplittableDoFn. I know that many of you have done a bunch of work with
IOs and/or runner integration for SplittableDoFn and would appreciate your
help in advancing this awesome idea. If you have questions or things you
want to get reviewed related to SplittableDoFn, feel free to send them my
way or include me on anything SplittableDoFn related.

I was part of several discussions with Eugene and I think the biggest
outstanding design portion is to figure out how dynamic work rebalancing
would play out with the portability APIs. This includes reporting of
progress from within a bundle. I know that Eugene had shared some documents
in this regard but the position / split models didn't work too cleanly in a
unified sense for bounded and unbounded SplittableDoFns. It will likely
take me awhile to gather my thoughts but could use your expertise as to how
compatible these ideas are with respect to to IOs and runners
Flink/Spark/Dataflow/Samza/Apex/... and obviously help during
implementation.

Reply via email to