iht opened a new issue, #25447:
URL: https://github.com/apache/beam/issues/25447

   The [method AvroIO `ReadFiles.withDesiredBundleSizeBytes` is marked as 
private](https://github.com/apache/beam/blob/bb5e200df70a875379ff5a6dfe72325172c3eb77/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java#L810-L813),
 which implies that users always have to use the default value of 64 MB.
   
   In streaming applications, this value can be too large and cause memory 
issues. For instance, with [the default number of threads in Dataflow streaming 
engine is for instance 300 per 
VM](https://cloud.google.com/dataflow/docs/guides/troubleshoot-oom#beam-javago-sdk),
 which implies that a VM would need >19 GB of memory (300x64 MB) to be able to 
read all the file bundles (assuming that there were at least 19 GB of data to 
be read).
   
   With that method, users can control the bundle size and reduce the amount of 
memory needed to use `ReadFiles` in streaming.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to