Thanks Luke! I've had a pass.
-Max
On 28.08.20 01:22, Luke Cwik wrote:
As an update.
Direct and Twister2 are done.
Samza: is ready for review[1].
Flink: is almost ready for review. [2] lays all the groundwork for the
migration and [3] finishes the migration (there is a timeout happening
in FlinkSubmissionTest that I'm trying to figure out).
No further updates on Spark[4] or Jet[5].
@Maximilian Michels <mailto:m...@apache.org> or @t...@apache.org
<mailto:thomas.we...@gmail.com>, can either of you take a look at the
Flink PRs?
@ke.wu...@icloud.com <mailto:ke.wu...@icloud.com>, Since Xinyu delegated
to you, can you take another look at the Samza PR?
1: https://github.com/apache/beam/pull/12617
2: https://github.com/apache/beam/pull/12706
3: https://github.com/apache/beam/pull/12708
4: https://github.com/apache/beam/pull/12603
5: https://github.com/apache/beam/pull/12616
On Tue, Aug 18, 2020 at 11:42 AM Pulasthi Supun Wickramasinghe
<pulasthi...@gmail.com <mailto:pulasthi...@gmail.com>> wrote:
Hi Luke
Will take a look at this as soon as possible and get back to you.
Best Regards,
Pulasthi
On Tue, Aug 18, 2020 at 2:30 PM Luke Cwik <lc...@google.com
<mailto:lc...@google.com>> wrote:
I have made some good progress here and have gotten to the
following state for non-portable runners:
DirectRunner[1]: Merged. Supports Read.Bounded and Read.Unbounded.
Twister2[2]: Ready for review. Supports Read.Bounded, the
current runner doesn't support unbounded pipelines.
Spark[3]: WIP. Supports Read.Bounded, Nexmark suite passes. Not
certain about level of unbounded pipeline support coverage since
Spark uses its own tiny suite of tests to get unbounded pipeline
coverage instead of the validates runner set.
Jet[4]: WIP. Supports Read.Bounded. Read.Unbounded definitely
needs additional work.
Sazma[5]: WIP. Supports Read.Bounded. Not certain about level of
unbounded pipeline support coverage since Spark uses its own
tiny suite of tests to get unbounded pipeline coverage instead
of the validates runner set.
Flink: Unstarted.
@Pulasthi Supun Wickramasinghe <mailto:pulasthi...@gmail.com> ,
can you help me with the Twister2 PR[2]?
@Ismaël Mejía <mailto:ieme...@gmail.com>, is PR[3] the expected
level of support for unbounded pipelines and hence ready for review?
@Jozsef Bartok <mailto:jo...@hazelcast.com>, can you help me out
to get support for unbounded splittable DoFn's into Jet[4]?
@Xinyu Liu <mailto:xinyuliu...@gmail.com>, is PR[5] the expected
level of support for unbounded pipelines and hence ready for review?
1: https://github.com/apache/beam/pull/12519
2: https://github.com/apache/beam/pull/12594
3: https://github.com/apache/beam/pull/12603
4: https://github.com/apache/beam/pull/12616
5: https://github.com/apache/beam/pull/12617
On Tue, Aug 11, 2020 at 10:55 AM Luke Cwik <lc...@google.com
<mailto:lc...@google.com>> wrote:
There shouldn't be any changes required since the wrapper
will smoothly transition the execution to be run as an SDF.
New IOs should strongly prefer to use SDF since it should be
simpler to write and will be more flexible but they can use
the "*Source"-based APIs. Eventually we'll deprecate the
APIs but we will never stop supporting them. Eventually they
should all be migrated to use SDF and if there is another
major Beam version, we'll finally be able to remove them.
On Tue, Aug 11, 2020 at 8:40 AM Alexey Romanenko
<aromanenko....@gmail.com <mailto:aromanenko....@gmail.com>>
wrote:
Hi Luke,
Great to hear about such progress on this!
Talking about opt-out for all runners in the future,
will it require any code changes for current
“*Source”-based IOs or the wrappers should completely
smooth this transition?
Do we need to require to create new IOs only based on
SDF or again, the wrappers should help to avoid this?
On 10 Aug 2020, at 22:59, Luke Cwik <lc...@google.com
<mailto:lc...@google.com>> wrote:
In the past couple of months wrappers[1, 2] have been
added to the Beam Java SDK which can execute
BoundedSource and UnboundedSource as Splittable DoFns.
These have been opt-out for portable pipelines (e.g.
Dataflow runner v2, XLang pipelines on Flink/Spark)
and opt-in using an experiment for all other pipelines.
I would like to start making the non-portable
pipelines starting with the DirectRunner[3] to be
opt-out with the plan that eventually all runners will
only execute splittable DoFns and the
BoundedSource/UnboundedSource specific execution logic
from the runners will be removed.
Users will be able to opt-in any pipeline using the
experiment 'use_sdf_read' and opt-out with the
experiment 'use_deprecated_read'. (For portable
pipelines these experiments were 'beam_fn_api' and
'beam_fn_api_use_deprecated_read' respectively and I
have added these two additional aliases to make the
experience less confusing).
1:
https://github.com/apache/beam/blob/af1ce643d8fde5352d4519a558de4a2dfd24721d/sdks/java/core/src/main/java/org/apache/beam/sdk/io/Read.java#L275
2:
https://github.com/apache/beam/blob/af1ce643d8fde5352d4519a558de4a2dfd24721d/sdks/java/core/src/main/java/org/apache/beam/sdk/io/Read.java#L449
3: https://github.com/apache/beam/pull/12519
--
Pulasthi S. Wickramasinghe
PhD Candidate | Research Assistant
School of Informatics and Computing | Digital Science Center
Indiana University, Bloomington
cell: 224-386-9035 <tel:(224)%20386-9035>