For the 2.25 release the Java Direct, Flink, Jet, Samza, Twister2 will use
SDF powered Read transforms. Users can opt-out
with --experiments=use_deprecated_read.

I was unable to get Spark done for 2.25 as I found out that Spark streaming
doesn't support watermark holds[1]. If someone knows more about the
watermark system in Spark I could use some guidance here as I believe I
have a version of unbounded SDF support written for Spark (I get all the
expected output from tests, just that watermarks aren't being held back so
PAssert fails).

Also, I started a doc[2] to produce an updated blog post since the original
SplittableDoFn blog from 2017 is out of date[3]. I was thinking of making
this a new blog post and having the old blog post point to it. We could
also remove the old blog post and or update it. Any thoughts?

Finally, I have a clean-up PR[4] that pushes the Read -> primitive Read
expansion into each of the runners instead of having it within Read
transform within beam-sdks-java-core.

1: https://github.com/apache/beam/pull/12603
2: http://doc/1kpn0RxqZaoacUPVSMYhhnfmlo8fGT-p50fEblaFr2HE
3: https://beam.apache.org/blog/splittable-do-fn/
4: https://github.com/apache/beam/pull/13006


On Fri, Aug 28, 2020 at 1:45 AM Maximilian Michels <m...@apache.org> wrote:

> Thanks Luke! I've had a pass.
>
> -Max
>
> On 28.08.20 01:22, Luke Cwik wrote:
> > As an update.
> >
> > Direct and Twister2 are done.
> > Samza: is ready for review[1].
> > Flink: is almost ready for review. [2] lays all the groundwork for the
> > migration and [3] finishes the migration (there is a timeout happening
> > in FlinkSubmissionTest that I'm trying to figure out).
> > No further updates on Spark[4] or Jet[5].
> >
> > @Maximilian Michels <mailto:m...@apache.org> or @t...@apache.org
> > <mailto:thomas.we...@gmail.com>, can either of you take a look at the
> > Flink PRs?
> > @ke.wu...@icloud.com <mailto:ke.wu...@icloud.com>, Since Xinyu
> delegated
> > to you, can you take another look at the Samza PR?
> >
> > 1: https://github.com/apache/beam/pull/12617
> > 2: https://github.com/apache/beam/pull/12706
> > 3: https://github.com/apache/beam/pull/12708
> > 4: https://github.com/apache/beam/pull/12603
> > 5: https://github.com/apache/beam/pull/12616
> >
> > On Tue, Aug 18, 2020 at 11:42 AM Pulasthi Supun Wickramasinghe
> > <pulasthi...@gmail.com <mailto:pulasthi...@gmail.com>> wrote:
> >
> >     Hi Luke
> >
> >     Will take a look at this as soon as possible and get back to you.
> >
> >     Best Regards,
> >     Pulasthi
> >
> >     On Tue, Aug 18, 2020 at 2:30 PM Luke Cwik <lc...@google.com
> >     <mailto:lc...@google.com>> wrote:
> >
> >         I have made some good progress here and have gotten to the
> >         following state for non-portable runners:
> >
> >         DirectRunner[1]: Merged. Supports Read.Bounded and
> Read.Unbounded.
> >         Twister2[2]: Ready for review. Supports Read.Bounded, the
> >         current runner doesn't support unbounded pipelines.
> >         Spark[3]: WIP. Supports Read.Bounded, Nexmark suite passes. Not
> >         certain about level of unbounded pipeline support coverage since
> >         Spark uses its own tiny suite of tests to get unbounded pipeline
> >         coverage instead of the validates runner set.
> >         Jet[4]: WIP. Supports Read.Bounded. Read.Unbounded definitely
> >         needs additional work.
> >         Sazma[5]: WIP. Supports Read.Bounded. Not certain about level of
> >         unbounded pipeline support coverage since Spark uses its own
> >         tiny suite of tests to get unbounded pipeline coverage instead
> >         of the validates runner set.
> >         Flink: Unstarted.
> >
> >         @Pulasthi Supun Wickramasinghe <mailto:pulasthi...@gmail.com> ,
> >         can you help me with the Twister2 PR[2]?
> >         @Ismaël Mejía <mailto:ieme...@gmail.com>, is PR[3] the expected
> >         level of support for unbounded pipelines and hence ready for
> review?
> >         @Jozsef Bartok <mailto:jo...@hazelcast.com>, can you help me out
> >         to get support for unbounded splittable DoFn's into Jet[4]?
> >         @Xinyu Liu <mailto:xinyuliu...@gmail.com>, is PR[5] the expected
> >         level of support for unbounded pipelines and hence ready for
> review?
> >
> >         1: https://github.com/apache/beam/pull/12519
> >         2: https://github.com/apache/beam/pull/12594
> >         3: https://github.com/apache/beam/pull/12603
> >         4: https://github.com/apache/beam/pull/12616
> >         5: https://github.com/apache/beam/pull/12617
> >
> >         On Tue, Aug 11, 2020 at 10:55 AM Luke Cwik <lc...@google.com
> >         <mailto:lc...@google.com>> wrote:
> >
> >             There shouldn't be any changes required since the wrapper
> >             will smoothly transition the execution to be run as an SDF.
> >             New IOs should strongly prefer to use SDF since it should be
> >             simpler to write and will be more flexible but they can use
> >             the "*Source"-based APIs. Eventually we'll deprecate the
> >             APIs but we will never stop supporting them. Eventually they
> >             should all be migrated to use SDF and if there is another
> >             major Beam version, we'll finally be able to remove them.
> >
> >             On Tue, Aug 11, 2020 at 8:40 AM Alexey Romanenko
> >             <aromanenko....@gmail.com <mailto:aromanenko....@gmail.com>>
> >             wrote:
> >
> >                 Hi Luke,
> >
> >                 Great to hear about such progress on this!
> >
> >                 Talking about opt-out for all runners in the future,
> >                 will it require any code changes for current
> >                 “*Source”-based IOs or the wrappers should completely
> >                 smooth this transition?
> >                 Do we need to require to create new IOs only based on
> >                 SDF or again, the wrappers should help to avoid this?
> >
> >>                 On 10 Aug 2020, at 22:59, Luke Cwik <lc...@google.com
> >>                 <mailto:lc...@google.com>> wrote:
> >>
> >>                 In the past couple of months wrappers[1, 2] have been
> >>                 added to the Beam Java SDK which can execute
> >>                 BoundedSource and UnboundedSource as Splittable DoFns.
> >>                 These have been opt-out for portable pipelines (e.g.
> >>                 Dataflow runner v2, XLang pipelines on Flink/Spark)
> >>                 and opt-in using an experiment for all other pipelines.
> >>
> >>                 I would like to start making the non-portable
> >>                 pipelines starting with the DirectRunner[3] to be
> >>                 opt-out with the plan that eventually all runners will
> >>                 only execute splittable DoFns and the
> >>                 BoundedSource/UnboundedSource specific execution logic
> >>                 from the runners will be removed.
> >>
> >>                 Users will be able to opt-in any pipeline using the
> >>                 experiment 'use_sdf_read' and opt-out with the
> >>                 experiment 'use_deprecated_read'. (For portable
> >>                 pipelines these experiments were 'beam_fn_api' and
> >>                 'beam_fn_api_use_deprecated_read' respectively and I
> >>                 have added these two additional aliases to make the
> >>                 experience less confusing).
> >>
> >>                 1:
> >>
> https://github.com/apache/beam/blob/af1ce643d8fde5352d4519a558de4a2dfd24721d/sdks/java/core/src/main/java/org/apache/beam/sdk/io/Read.java#L275
> >>                 2:
> >>
> https://github.com/apache/beam/blob/af1ce643d8fde5352d4519a558de4a2dfd24721d/sdks/java/core/src/main/java/org/apache/beam/sdk/io/Read.java#L449
> >>                 3: https://github.com/apache/beam/pull/12519
> >
> >
> >
> >     --
> >     Pulasthi S. Wickramasinghe
> >     PhD Candidate  | Research Assistant
> >     School of Informatics and Computing | Digital Science Center
> >     Indiana University, Bloomington
> >     cell: 224-386-9035 <(224)%20386-9035> <tel:(224)%20386-9035>
> >
>

Reply via email to