[ 
https://issues.apache.org/jira/browse/BEAM-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669379#comment-16669379
 ] 

Robert Burke commented on BEAM-4124:
------------------------------------

You are correct, and this is on the Sending direction, rather than the 
Receiving direction. The comment on that line is wrong about being on the 
"incoming" message, but instead is about an outgoing one.

I'll need to see how it fails without it, but I'd be inclined to avoid beam 
making assumptions about the transport at this point, and rely on the more 
easily overridable GRPC mechanism. I forgot about that limitation.

Beam Go is doing its own buffering ahead of the GRPC stream to batch small 
elements, rather than trying to send a single element each time.

Ultimately there's a 2GB limit, due to various protocol buffer restrictions, 
but setting it that high means that entire bundles might get batched instead of 
sending

Thanks to [~lcwik] I found the Portability document about properly handling 
large elements:
[https://docs.google.com/document/d/1IGduUqmhWDi_69l9nG8kw73HZ5WI5wOps9Tshl5wpQA/edit#heading=h.akxviyj4m0f0]
 
But no runner implements this yet.

So, in principle we want the ~4MB limit so that elements beyond that aren't 
cached until the bundle is completed (which keeps progress moving forward, and 
not wasting network), but from the 4MB (some lower bound) -> ~2GB (upper bound) 
range, we could do the "simple" thing, and flush immediately, instead of 
failing. Obviously this would be a problem for elements > 2GB in size, but then 
I'd argue one shouldn't be processing them quite like that with beam. :) 
This wouldn't avoid the need for a tuning parameter for some users (eg. Who may 
prefer larger than 4MB batches for perf reasons), but it certainly punts 
needing it for all users, where any element > 4MB being sent alone is fine.

That would be also independent of the GRPC Send Size, which defaults to MaxInt, 
so it should be workable. The Receive cap would need to be adjusted accordingly 
by users who run into this issue though, since returning back through GBK would 
cause that limit to be hit for the stream of elements.

> Support elements larger than 4 MB
> ---------------------------------
>
>                 Key: BEAM-4124
>                 URL: https://issues.apache.org/jira/browse/BEAM-4124
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-go
>            Reporter: Cody Schroeder
>            Priority: Major
>
> The Go SDK harness is limited by a gRPC message size limit of 4 MB.
> https://github.com/apache/beam/blob/4a32353/sdks/go/pkg/beam/core/runtime/harness/datamgr.go#L31



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to