[ https://issues.apache.org/jira/browse/BEAM-7726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004337#comment-17004337 ]
Robert Burke edited comment on BEAM-7726 at 2/4/20 10:46 PM: ------------------------------------------------------------- The data channel is correctly multiplexing bundles. There's no other way to do the multiple streams thing in the current protocol and GRPC without the runner having multiple endpoints, or the process doing so (eg. Multiple SDK Harnesses per worker, which is how python handles it). I think I have a resolution for state backed iterables blocking the datachannel, which will work for any runners that support datasource split requests. If the data channel is eventually split down to a the current value and no more, we can close the reader, which will cause the channel to be unblocked. Any buffered data will be drained. Care needs to be taken to avoid deadlocking or dataloss or race conditions, but there should only be lock contention when the Split thread is closing the reader. Edit (2020/02/04): I wasn't able to confirm that this actually worked better, and even though there was no material locking overhead, the additional complexity to that part of the code isn't worth questionable benefits. Tabling for now. was (Author: lostluck): The data channel is correctly multiplexing bundles. There's no other way to do the multiple streams thing in the current protocol and GRPC without the runner having multiple endpoints, or the process doing so (eg. Multiple SDK Harnesses per worker, which is how python handles it). I think I have a resolution for state backed iterables blocking the datachannel, which will work for any runners that support datasource split requests. If the data channel is eventually split down to a the current value and no more, we can close the reader, which will cause the channel to be unblocked. Any buffered data will be drained. Care needs to be taken to avoid deadlocking or dataloss or race conditions, but there should only be lock contention when the Split thread is closing the reader. Edit: I wasn't able to confirm that this actually worked better, and even though there was no material locking overhead, the additional complexity to that part of the code isn't worth questionable benefits. Tabling for now. > [Go SDK] State Backed Iterables > ------------------------------- > > Key: BEAM-7726 > URL: https://issues.apache.org/jira/browse/BEAM-7726 > Project: Beam > Issue Type: Improvement > Components: sdk-go > Affects Versions: Not applicable > Reporter: Robert Burke > Assignee: Robert Burke > Priority: Major > Fix For: Not applicable > > Time Spent: 3h > Remaining Estimate: 0h > > The Go SDK should support the State backed iterables protocol per the proto. > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L644] > > Primary case is for iterables after CoGBKs. -- This message was sent by Atlassian Jira (v8.3.4#803005)