wollowizard opened a new pull request, #34750: URL: https://github.com/apache/beam/pull/34750
`org.apache.beam.sdk.extensions.avro.io.AvroDatumFactory.ReflectDatumFactory` uses cache to avoid to re-create `ReflectDatumReader` and `ReflectDatumWriter` that were previously created for the same schemas. This addresses #34749. After applying this fix (manually in my code), I see very relevant memory allocation improvements. The heap profile shows that now `org.apache.beam.sdk.extensions.avro.io.AvroDatumFactory.ReflectDatumFactory#apply(org.apache.avro.Schema, org.apache.avro.Schema)` is responsible for 0.512% of allocations, down from 27.7% <img width="1630" alt="image" src="https://github.com/user-attachments/assets/7da052d0-2454-4950-b678-1eecb4a6b696" /> This has a good impact on memory used, and also on cpu (likely for the cpu saved on GC), allowing that specific streaming pipeline to run on 2 to 3 (depending on traffic) n2d-standard-4 VMs instead of 4 such VMs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org