[
https://issues.apache.org/jira/browse/BEAM-3515?focusedWorklogId=99359&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-99359
]
ASF GitHub Bot logged work on BEAM-3515:
----------------------------------------
Author: ASF GitHub Bot
Created on: 08/May/18 04:06
Start Date: 08/May/18 04:06
Worklog Time Spent: 10m
Work Description: kennknowles commented on issue #5277: [BEAM-3515]
Portable translation of SplittableProcessKeyed
URL: https://github.com/apache/beam/pull/5277#issuecomment-387278147
Sorry, I reacted to your comment, not the particular change. I may have
missed the plan. +1 to Luke's question. From that comment it sounds like you
are treating the model right, but runners have to reassemble the appropriate
instruction for an SDK harness. The use of ParDoPayload in the execution side
is a pun, not a logical necessity.
But depending on the composite structure, or being vague about what is a
Beam primitive got me concerned. My opinion is that inspection of the composite
structure of a transform is a catastrophic design error, and that the payload
should contain the full spec needed for an arbitrary implementation strategy.
COMBINE_PGBKCV should exist on the runner side only. The runner might insert
this before submission or when building a stage for the fn API. Put another
way, an SDK should be able to specify any correct composite structure it wants
for combine. I can't see a way to justify throwing away that abstraction since
it is so easy to do right and already done by ~every system.
I'm totally cool with non-primitives and pseudo-primitives having
well-defined payloads. I definitely don't want to imply otherwise. I think an
explicit expansion of a splittable ParDoPayload to a composite with a
(runner-side) Splittable process keyed payload is just great. But no reason an
SDK should need to do anything novel. If so, you need to invent some execution
side instructions and embrace that they are fundamentally different than
high-level PCillection transforms.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 99359)
Time Spent: 2.5h (was: 2h 20m)
> Use portable ParDoPayload for SDF in DataflowRunner
> ---------------------------------------------------
>
> Key: BEAM-3515
> URL: https://issues.apache.org/jira/browse/BEAM-3515
> Project: Beam
> Issue Type: Sub-task
> Components: runner-dataflow
> Reporter: Kenneth Knowles
> Assignee: Eugene Kirpichov
> Priority: Major
> Labels: portability
> Time Spent: 2.5h
> Remaining Estimate: 0h
>
> The Java-specific blobs transmitted to Dataflow need more context, in the
> form of portability framework protos.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)