[ 
https://issues.apache.org/jira/browse/BEAM-3515?focusedWorklogId=99214&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-99214
 ]

ASF GitHub Bot logged work on BEAM-3515:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 07/May/18 21:51
            Start Date: 07/May/18 21:51
    Worklog Time Spent: 10m 
      Work Description: jkff commented on issue #5277: [BEAM-3515] Portable 
translation of SplittableProcessKeyed
URL: https://github.com/apache/beam/pull/5277#issuecomment-387218166
 
 
   ParDo is a primitive in the same sense Combine is a primitive - ideally, the 
pipeline proto would include the whole thing (e.g. a ParDo transform with 
`is_splittable=true`, or a Combine transform), and then the runner would do 
whatever it takes to evaluate it.
   
   In practice, runners would manipulate the proto and rewrite the Combine 
transform into a graph of PGBKCV etc., and perform combiner lifting and other 
things. Likewise, runners would rewrite a splittable ParDo into the standard 
SDF expansion (pair with restriction, split restriction, 
SplittableProcessKeyed). So the SPK primitive ought to exist in any case; at 
least for use by runners that would like to use this standard expansion, which 
I think in practice would be all runners. I think it would be misleading and 
useless to overload the ParDo primitive to also play the SPK role. SPK does not 
encapsulate the whole splittable ParDo, only the "hard" part that executes 
(unique key, element, restriction) tuples.
   
   Note that right now we're a bit away from the vision above, in the sense 
that both Combine expansion and SDF expansion are effectively done by the SDK 
rather than by the runner, in a language-specific way, via PTransform overrides 
- but this will change, for both of the transforms, as portability matures and 
all graph surgery is moved behind the JobService for all runners.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 99214)
    Time Spent: 1h 20m  (was: 1h 10m)

> Use portable ParDoPayload for SDF in DataflowRunner
> ---------------------------------------------------
>
>                 Key: BEAM-3515
>                 URL: https://issues.apache.org/jira/browse/BEAM-3515
>             Project: Beam
>          Issue Type: Sub-task
>          Components: runner-dataflow
>            Reporter: Kenneth Knowles
>            Assignee: Eugene Kirpichov
>            Priority: Major
>              Labels: portability
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The Java-specific blobs transmitted to Dataflow need more context, in the 
> form of portability framework protos.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to