[ 
https://issues.apache.org/jira/browse/BEAM-7726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004337#comment-17004337
 ] 

Robert Burke edited comment on BEAM-7726 at 2/4/20 10:46 PM:
-------------------------------------------------------------

The data channel is correctly multiplexing bundles. There's no other way to do 
the multiple streams thing in the current protocol and GRPC without the runner 
having multiple endpoints, or the process doing so (eg. Multiple SDK Harnesses 
per worker, which is how python handles it).

I think I have a resolution for state backed iterables blocking the 
datachannel, which will work for any runners that support datasource split 
requests. If the data channel is eventually split down to a the current value 
and no more, we can close the reader, which will cause the channel to be 
unblocked. Any buffered data will be drained. Care needs to be taken to avoid 
deadlocking or dataloss or race conditions, but there should only be lock 
contention  when the Split thread is closing the reader.

Edit (2020/02/04): I wasn't able to confirm that this actually worked better, 
and even though there was no material locking overhead, the additional 
complexity to that part of the code isn't worth  questionable benefits. Tabling 
for now. 


was (Author: lostluck):
The data channel is correctly multiplexing bundles. There's no other way to do 
the multiple streams thing in the current protocol and GRPC without the runner 
having multiple endpoints, or the process doing so (eg. Multiple SDK Harnesses 
per worker, which is how python handles it).

I think I have a resolution for state backed iterables blocking the 
datachannel, which will work for any runners that support datasource split 
requests. If the data channel is eventually split down to a the current value 
and no more, we can close the reader, which will cause the channel to be 
unblocked. Any buffered data will be drained. Care needs to be taken to avoid 
deadlocking or dataloss or race conditions, but there should only be lock 
contention  when the Split thread is closing the reader.

Edit: I wasn't able to confirm that this actually worked better, and even 
though there was no material locking overhead, the additional complexity to 
that part of the code isn't worth  questionable benefits. Tabling for now. 

> [Go SDK] State Backed Iterables
> -------------------------------
>
>                 Key: BEAM-7726
>                 URL: https://issues.apache.org/jira/browse/BEAM-7726
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-go
>    Affects Versions: Not applicable
>            Reporter: Robert Burke
>            Assignee: Robert Burke
>            Priority: Major
>             Fix For: Not applicable
>
>          Time Spent: 3h
>  Remaining Estimate: 0h
>
> The Go SDK should support the State backed iterables protocol per the proto.
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L644]
>  
> Primary case is for iterables after CoGBKs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to