[
https://issues.apache.org/jira/browse/BEAM-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669379#comment-16669379
]
Robert Burke commented on BEAM-4124:
------------------------------------
You are correct, and this is on the Sending direction, rather than the
Receiving direction. The comment on that line is wrong about being on the
"incoming" message, but instead is about an outgoing one.
I'll need to see how it fails without it, but I'd be inclined to avoid beam
making assumptions about the transport at this point, and rely on the more
easily overridable GRPC mechanism. I forgot about that limitation.
Beam Go is doing its own buffering ahead of the GRPC stream to batch small
elements, rather than trying to send a single element each time.
Ultimately there's a 2GB limit, due to various protocol buffer restrictions,
but setting it that high means that entire bundles might get batched instead of
sending
Thanks to [~lcwik] I found the Portability document about properly handling
large elements:
[https://docs.google.com/document/d/1IGduUqmhWDi_69l9nG8kw73HZ5WI5wOps9Tshl5wpQA/edit#heading=h.akxviyj4m0f0]
But no runner implements this yet.
So, in principle we want the ~4MB limit so that elements beyond that aren't
cached until the bundle is completed (which keeps progress moving forward, and
not wasting network), but from the 4MB (some lower bound) -> ~2GB (upper bound)
range, we could do the "simple" thing, and flush immediately, instead of
failing. Obviously this would be a problem for elements > 2GB in size, but then
I'd argue one shouldn't be processing them quite like that with beam. :)
This wouldn't avoid the need for a tuning parameter for some users (eg. Who may
prefer larger than 4MB batches for perf reasons), but it certainly punts
needing it for all users, where any element > 4MB being sent alone is fine.
That would be also independent of the GRPC Send Size, which defaults to MaxInt,
so it should be workable. The Receive cap would need to be adjusted accordingly
by users who run into this issue though, since returning back through GBK would
cause that limit to be hit for the stream of elements.
> Support elements larger than 4 MB
> ---------------------------------
>
> Key: BEAM-4124
> URL: https://issues.apache.org/jira/browse/BEAM-4124
> Project: Beam
> Issue Type: Bug
> Components: sdk-go
> Reporter: Cody Schroeder
> Priority: Major
>
> The Go SDK harness is limited by a gRPC message size limit of 4 MB.
> https://github.com/apache/beam/blob/4a32353/sdks/go/pkg/beam/core/runtime/harness/datamgr.go#L31
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)