[
https://issues.apache.org/jira/browse/BEAM-3515?focusedWorklogId=99214&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-99214
]
ASF GitHub Bot logged work on BEAM-3515:
----------------------------------------
Author: ASF GitHub Bot
Created on: 07/May/18 21:51
Start Date: 07/May/18 21:51
Worklog Time Spent: 10m
Work Description: jkff commented on issue #5277: [BEAM-3515] Portable
translation of SplittableProcessKeyed
URL: https://github.com/apache/beam/pull/5277#issuecomment-387218166
ParDo is a primitive in the same sense Combine is a primitive - ideally, the
pipeline proto would include the whole thing (e.g. a ParDo transform with
`is_splittable=true`, or a Combine transform), and then the runner would do
whatever it takes to evaluate it.
In practice, runners would manipulate the proto and rewrite the Combine
transform into a graph of PGBKCV etc., and perform combiner lifting and other
things. Likewise, runners would rewrite a splittable ParDo into the standard
SDF expansion (pair with restriction, split restriction,
SplittableProcessKeyed). So the SPK primitive ought to exist in any case; at
least for use by runners that would like to use this standard expansion, which
I think in practice would be all runners. I think it would be misleading and
useless to overload the ParDo primitive to also play the SPK role. SPK does not
encapsulate the whole splittable ParDo, only the "hard" part that executes
(unique key, element, restriction) tuples.
Note that right now we're a bit away from the vision above, in the sense
that both Combine expansion and SDF expansion are effectively done by the SDK
rather than by the runner, in a language-specific way, via PTransform overrides
- but this will change, for both of the transforms, as portability matures and
all graph surgery is moved behind the JobService for all runners.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 99214)
Time Spent: 1h 20m (was: 1h 10m)
> Use portable ParDoPayload for SDF in DataflowRunner
> ---------------------------------------------------
>
> Key: BEAM-3515
> URL: https://issues.apache.org/jira/browse/BEAM-3515
> Project: Beam
> Issue Type: Sub-task
> Components: runner-dataflow
> Reporter: Kenneth Knowles
> Assignee: Eugene Kirpichov
> Priority: Major
> Labels: portability
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> The Java-specific blobs transmitted to Dataflow need more context, in the
> form of portability framework protos.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)