[
https://issues.apache.org/jira/browse/BEAM-10670?focusedWorklogId=475425&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-475425
]
ASF GitHub Bot logged work on BEAM-10670:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 27/Aug/20 17:10
Start Date: 27/Aug/20 17:10
Worklog Time Spent: 10m
Work Description: lukecwik edited a comment on pull request #12617:
URL: https://github.com/apache/beam/pull/12617#issuecomment-682077852
> > @lukecwik : Ke from samza side will help take a look. Thanks!
>
> @kw2542 If we want to support unbounded splittable DoFns using the
non-portable execution then we'll need to support
[GBKIntoKeyedWorkItem](https://github.com/apache/beam/blob/ecfc389838400721b2a0379a9655969eed3dbf57/runners/core-java/src/main/java/org/apache/beam/runners/core/SplittableParDoViaKeyedWorkItems.java#L79).
>
> I see that there is
[KvToKeyedWorkItemOp](https://github.com/apache/beam/blob/master/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/KvToKeyedWorkItemOp.java)
but it doesn't output any timers that need to fire which is something that the
underlying [splittable dofn
implementation](https://github.com/apache/beam/blob/ecfc389838400721b2a0379a9655969eed3dbf57/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDo.java#L235)
is relying on. The timer firing seems to be done by
[GroupByKeyOp](https://github.com/apache/beam/blob/ecfc389838400721b2a0379a9655969eed3dbf57/runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/GroupByKeyOp.java#L225).
>
> Is this something you can help me with? (feel free to open PRs against [my
repo](https://github.com/lukecwik/incubator-beam/tree/beam10670.3) and or
provide suggestions on this PR)
I worked through the translation logic and was able to get unbounded
splittable dofn tests to pass. The things that don't work are:
* side inputs for unbounded splittable dofns (unboundedsources couldn't have
side inputs so this has feature parity)
* bundle finalization (was already unsupported) and the current
UnboundedSourceSystem doesn't support finalizing checkpoints
It also looks like I can't test unbounded splittable dofns in the global
window since PAssert doesn't seem to work for Samza in an unbounded pipeline
running in the global window. I can manually see that output is being produced
via a log statement in the output manager.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 475425)
Time Spent: 11h 40m (was: 11.5h)
> Make non-portable Splittable DoFn the only option when executing Java "Read"
> transforms
> ---------------------------------------------------------------------------------------
>
> Key: BEAM-10670
> URL: https://issues.apache.org/jira/browse/BEAM-10670
> Project: Beam
> Issue Type: Improvement
> Components: sdk-java-core
> Reporter: Luke Cwik
> Assignee: Luke Cwik
> Priority: P2
> Time Spent: 11h 40m
> Remaining Estimate: 0h
>
> All runners seem to be capable of migrating to splittable DoFn for
> non-portable execution except for Dataflow runner v1 which will internalize
> the current primitive read implementation that is shared across runner
> implementations.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)