[
https://issues.apache.org/jira/browse/BEAM-7726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997623#comment-16997623
]
Robert Burke commented on BEAM-7726:
------------------------------------
At this point State Backed iterables, works, and provides correct results, but
I'd like to augment it with one bit of performance tuning: Page Look Ahead.
In particular, the protocol as is means the SDK is spending time waiting for
the runner to accomplish IO when it reaches the end of the page, and even
having a single page lookahead will improve pipelining.
There's a separate issue in the Go SDK around bundles subject to state backed
iterables end up blocking the data channel, if the runner sends the next
element to the data channel prior to the bundle being ready for it. The bundle
doesn't read from the data channel when it's processing from the state stream,
which eventually triggers the SDK's pushback, blocking other bundles on the
same worker.
Runners can solve this by terminating a bundle immediately after sending a
Large Iterable element. It's not sufficient to "wait" until the iterable is
done, since there's no signal the SDK can provide to say it's ready for more
from the Data channel, it's already reading values.
Alternatively, I need to find out if the Go SDK is doing the data channel
"correctly" WRT multiplexing bundles. If runners can make use of multiple
streams from the SDK (one per bundle), then ordinary GRPC multiplexing should
prevent the block. However, offhand, it's not clear that could work, since the
SDK side connection doesn't have a "say" in which bundle is being used at that
point, as that's not how BiDi GRPC streams workl there's no initial stream
creation request to identify a stream as owned by a given bundle. In this case,
it's up to the runners. I guess.
> [Go SDK] State Backed Iterables
> -------------------------------
>
> Key: BEAM-7726
> URL: https://issues.apache.org/jira/browse/BEAM-7726
> Project: Beam
> Issue Type: Improvement
> Components: sdk-go
> Affects Versions: Not applicable
> Reporter: Robert Burke
> Assignee: Robert Burke
> Priority: Major
> Fix For: Not applicable
>
> Time Spent: 3h
> Remaining Estimate: 0h
>
> The Go SDK should support the State backed iterables protocol per the proto.
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L644]
>
> Primary case is for iterables after CoGBKs.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)