[
https://issues.apache.org/jira/browse/BEAM-3515?focusedWorklogId=99381&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-99381
]
ASF GitHub Bot logged work on BEAM-3515:
----------------------------------------
Author: ASF GitHub Bot
Created on: 08/May/18 05:03
Start Date: 08/May/18 05:03
Worklog Time Spent: 10m
Work Description: jkff commented on issue #5277: [BEAM-3515] Portable
translation of SplittableProcessKeyed
URL: https://github.com/apache/beam/pull/5277#issuecomment-387285248
To Luke's question:
> Do you expect to turn this SplittableProcessKeyed back into a ParDoPayload
for use when executing or do you expect that all SDKs will understand
SplittableProcessKeyed?
Yes, I expect this to be a new instruction, with payload
SplittableProcessKeyedPayload, that all SDKs are required to understand.
To your latest comment:
I think I'm missing something because I still can't figure out what, if
anything, are we disagreeing on. There is no inspection of composite structure
involved; I think the path I have in mind for SDF is identical to what you
described for Combine, and so there's nothing novel about it.
I am assuming that runners will typically massage the received pipeline
proto, and then execute it (e.g. Spark would translate it to RDDs). The massage
(surgery) part may include things like fusion, combiner lifting etc.,
implemented as libraries used by runners. Runners are, of course, also free to
do these things without using the libraries, or do different things, or to not
do them at all. But one of the reusable surgery instruments would be the
standard expansion of ParDo(SDF) which would produce, among other things, an
SPK transform.
So, eventual state:
* Pipeline proto contains a ParDoPayload with is_splittable=true and
restriction coder specified.
* SPK never appears in a proto that an SDK submits to a JobService.
* Most or all runners, during surgery on the received pipeline proto, use
the standard SDF expansion, which produces the SPK transform.
* After surgery is done, when executing the modified pipeline, most or all
runners will end up interpreting the SPK transform by sending an SPK
instruction for the SDK.
So SPK will exist on the runner side only (to the extent that all PTransform
overrides will eventually be runner side only).
Current state: I'm just trying to get the portable Dataflow runner working
first; the Dataflow runner doesn't do pipeline proto surgery, it relies on
PTransform overrides, so it never produces a pipeline proto with a ParDoPayload
with is_splittable=true - instead, after applying its PTransform overrides, it
produces the SPK transform directly, and translates it to an SPK step in the
Job. Via various Dataflow backend and worker magic, the Dataflow runner, too,
will produce an SPK instruction for the SDK.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 99381)
Time Spent: 2h 40m (was: 2.5h)
> Use portable ParDoPayload for SDF in DataflowRunner
> ---------------------------------------------------
>
> Key: BEAM-3515
> URL: https://issues.apache.org/jira/browse/BEAM-3515
> Project: Beam
> Issue Type: Sub-task
> Components: runner-dataflow
> Reporter: Kenneth Knowles
> Assignee: Eugene Kirpichov
> Priority: Major
> Labels: portability
> Time Spent: 2h 40m
> Remaining Estimate: 0h
>
> The Java-specific blobs transmitted to Dataflow need more context, in the
> form of portability framework protos.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)