[ 
https://issues.apache.org/jira/browse/BEAM-7726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997623#comment-16997623
 ] 

Robert Burke commented on BEAM-7726:
------------------------------------

At this point State Backed iterables, works, and provides correct results, but 
I'd like to augment it with one bit of performance tuning: Page Look Ahead.

In particular, the protocol as is means the SDK is spending time waiting for 
the runner to accomplish IO when it reaches the end of the page, and even 
having a single page lookahead will improve pipelining.

There's a separate issue in the Go SDK around bundles subject to state backed 
iterables end up blocking the data channel, if the runner sends the next 
element to the data channel prior to the bundle being ready for it. The bundle 
doesn't read from the data channel when it's processing from the state stream, 
which eventually triggers the SDK's pushback, blocking other bundles on the 
same worker.  

Runners can solve this by terminating a bundle immediately after sending a 
Large Iterable element. It's not sufficient to "wait" until the iterable is 
done, since there's no signal the SDK can provide to say it's ready for more 
from the Data channel, it's already reading values.

Alternatively, I need to find out if the Go SDK is doing the data channel 
"correctly" WRT multiplexing bundles. If runners can make use of multiple 
streams from the SDK (one per bundle), then ordinary GRPC multiplexing should 
prevent the block. However, offhand, it's not clear that could work, since the 
SDK side connection doesn't have a "say" in which bundle is being used at that 
point, as that's not how  BiDi GRPC streams workl  there's no initial stream 
creation request to identify a stream as owned by a given bundle. In this case, 
it's up to the runners. I guess.

> [Go SDK] State Backed Iterables
> -------------------------------
>
>                 Key: BEAM-7726
>                 URL: https://issues.apache.org/jira/browse/BEAM-7726
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-go
>    Affects Versions: Not applicable
>            Reporter: Robert Burke
>            Assignee: Robert Burke
>            Priority: Major
>             Fix For: Not applicable
>
>          Time Spent: 3h
>  Remaining Estimate: 0h
>
> The Go SDK should support the State backed iterables protocol per the proto.
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L644]
>  
> Primary case is for iterables after CoGBKs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to