[
https://issues.apache.org/jira/browse/BEAM-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16675454#comment-16675454
]
Bill Neubauer commented on BEAM-4124:
-------------------------------------
I agree with the consensus that the Beam code shouldn't be enforcing the
transport limits, but it needs to have some awareness of them, and that
mechanism doesn't currently exist.
Given what's been done to support larger messages in the Java code, the only
current restriction is the code in datamgr.go that was previously identified.
The intent of that code was to provide a friendlier error message than what
occurred if the gRPC max was hit.
I think it's totally reasonable to remove the check in datamgr.go since it's
providing zero value at this juncture.
More work does need to be done on a segmentation protocol for larger messages,
and that protocol needs to be able to discover the limitations of the
underlying transport (for example, could be TWIRP instead of gRPC). It should
also have some affordances for intelligently buffering data to avoid thrashing
the underlying transport, but also not buffer data so much that
latency-critical messages get delayed and cause problems in workflow execution.
The RFCs for TCP provide all sorts of guidance for these scenarios, and any
design doc for a flexible Beam transport layer should compare and contrast its
proposals to TCP.
Speaking to a point Robert made earlier, the hard limit on messaging is
provided by the use of protocol buffers, so there can't be an element > 2 GB in
size without the user's code doing something to manage the spillover in size.
The constraint that Beam must satisfy is to ensure that any valid protocol
buffer can be transmitted. Removing the artificial restriction in datamgr.go is
sufficient to achieve that constraint. The remaining work I describe is just
performance improvements on top of the basic correctness guarantee. Therefore,
it's reasonable to just remove the code now and plan for the better future and
address it as we can.
> Support elements larger than 4 MB
> ---------------------------------
>
> Key: BEAM-4124
> URL: https://issues.apache.org/jira/browse/BEAM-4124
> Project: Beam
> Issue Type: Bug
> Components: sdk-go
> Reporter: Cody Schroeder
> Priority: Major
>
> The Go SDK harness is limited by a gRPC message size limit of 4 MB.
> https://github.com/apache/beam/blob/4a32353/sdks/go/pkg/beam/core/runtime/harness/datamgr.go#L31
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)