[jira] [Work logged] (BEAM-3515) Use portable ParDoPayload for SDF in DataflowRunner

ASF GitHub Bot (JIRA) Mon, 07 May 2018 22:04:56 -0700

     [ 
https://issues.apache.org/jira/browse/BEAM-3515?focusedWorklogId=99381&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-99381
 ]


ASF GitHub Bot logged work on BEAM-3515:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 08/May/18 05:03
            Start Date: 08/May/18 05:03
    Worklog Time Spent: 10m 
      Work Description: jkff commented on issue #5277: [BEAM-3515] Portable 
translation of SplittableProcessKeyed
URL: https://github.com/apache/beam/pull/5277#issuecomment-387285248
 
 
   To Luke's question:
   
   > Do you expect to turn this SplittableProcessKeyed back into a ParDoPayload 
for use when executing or do you expect that all SDKs will understand 
SplittableProcessKeyed?
   
   Yes, I expect this to be a new instruction, with payload 
SplittableProcessKeyedPayload, that all SDKs are required to understand.
   
   To your latest comment:
   
   I think I'm missing something because I still can't figure out what, if 
anything, are we disagreeing on. There is no inspection of composite structure 
involved; I think the path I have in mind for SDF is identical to what you 
described for Combine, and so there's nothing novel about it.
   
   I am assuming that runners will typically massage the received pipeline 
proto, and then execute it (e.g. Spark would translate it to RDDs). The massage 
(surgery) part may include things like fusion, combiner lifting etc., 
implemented as libraries used by runners. Runners are, of course, also free to 
do these things without using the libraries, or do different things, or to not 
do them at all. But one of the reusable surgery instruments would be the 
standard expansion of ParDo(SDF) which would produce, among other things, an 
SPK transform.
   
   So, eventual state:
   
   * Pipeline proto contains a ParDoPayload with is_splittable=true and 
restriction coder specified.
   * SPK never appears in a proto that an SDK submits to a JobService.
   * Most or all runners, during surgery on the received pipeline proto, use 
the standard SDF expansion, which produces the SPK transform.
   * After surgery is done, when executing the modified pipeline, most or all 
runners will end up interpreting the SPK transform by sending an SPK 
instruction for the SDK.
   
   So SPK will exist on the runner side only (to the extent that all PTransform 
overrides will eventually be runner side only).
   
   Current state: I'm just trying to get the portable Dataflow runner working 
first; the Dataflow runner doesn't do pipeline proto surgery, it relies on 
PTransform overrides, so it never produces a pipeline proto with a ParDoPayload 
with is_splittable=true - instead, after applying its PTransform overrides, it 
produces the SPK transform directly, and translates it to an SPK step in the 
Job. Via various Dataflow backend and worker magic, the Dataflow runner, too, 
will produce an SPK instruction for the SDK.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 99381)
    Time Spent: 2h 40m  (was: 2.5h)

> Use portable ParDoPayload for SDF in DataflowRunner
> ---------------------------------------------------
>
>                 Key: BEAM-3515
>                 URL: https://issues.apache.org/jira/browse/BEAM-3515
>             Project: Beam
>          Issue Type: Sub-task
>          Components: runner-dataflow
>            Reporter: Kenneth Knowles
>            Assignee: Eugene Kirpichov
>            Priority: Major
>              Labels: portability
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> The Java-specific blobs transmitted to Dataflow need more context, in the 
> form of portability framework protos.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-3515) Use portable ParDoPayload for SDF in DataflowRunner

Reply via email to