[ 
https://issues.apache.org/jira/browse/BEAM-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16675454#comment-16675454
 ] 

Bill Neubauer commented on BEAM-4124:
-------------------------------------

I agree with the consensus that the Beam code shouldn't be enforcing the 
transport limits, but it needs to have some awareness of them, and that 
mechanism doesn't currently exist.

Given what's been done to support larger messages in the Java code, the only 
current restriction is the code in datamgr.go that was previously identified. 
The intent of that code was to provide a friendlier error message than what 
occurred if the gRPC max was hit.

I think it's totally reasonable to remove the check in datamgr.go since it's 
providing zero value at this juncture.

More work does need to be done on a segmentation protocol for larger messages, 
and that protocol needs to be able to discover the limitations of the 
underlying transport (for example, could be TWIRP instead of gRPC). It should 
also have some affordances for intelligently buffering data to avoid thrashing 
the underlying transport, but also not buffer data so much that 
latency-critical messages get delayed and cause problems in workflow execution. 
The RFCs for TCP provide all sorts of guidance for these scenarios, and any 
design doc for a flexible Beam transport layer should compare and contrast its 
proposals to TCP.

Speaking to a point Robert made earlier, the hard limit on messaging is 
provided by the use of protocol buffers, so there can't be an element > 2 GB in 
size without the user's code doing something to manage the spillover in size.

The constraint that Beam must satisfy is to ensure that any valid protocol 
buffer can be transmitted. Removing the artificial restriction in datamgr.go is 
sufficient to achieve that constraint. The remaining work I describe is just 
performance improvements on top of the basic correctness guarantee. Therefore, 
it's reasonable to just remove the code now and plan for the better future and 
address it as we can.

> Support elements larger than 4 MB
> ---------------------------------
>
>                 Key: BEAM-4124
>                 URL: https://issues.apache.org/jira/browse/BEAM-4124
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-go
>            Reporter: Cody Schroeder
>            Priority: Major
>
> The Go SDK harness is limited by a gRPC message size limit of 4 MB.
> https://github.com/apache/beam/blob/4a32353/sdks/go/pkg/beam/core/runtime/harness/datamgr.go#L31



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to