wollowizard opened a new pull request, #34750:
URL: https://github.com/apache/beam/pull/34750

   
`org.apache.beam.sdk.extensions.avro.io.AvroDatumFactory.ReflectDatumFactory` 
uses cache to avoid to re-create `ReflectDatumReader` and `ReflectDatumWriter` 
that were previously created for the same schemas.
   
   This addresses #34749.
   
   After applying this fix (manually in my code), I see very relevant memory 
allocation improvements. The heap profile shows that now 
`org.apache.beam.sdk.extensions.avro.io.AvroDatumFactory.ReflectDatumFactory#apply(org.apache.avro.Schema,
 org.apache.avro.Schema)` is responsible for 0.512% of allocations, down from 
27.7%
   
   <img width="1630" alt="image" 
src="https://github.com/user-attachments/assets/7da052d0-2454-4950-b678-1eecb4a6b696";
 />
   
   This has a good impact on memory used, and also on cpu (likely for the cpu 
saved on GC), allowing that specific streaming pipeline to run on 2 to 3 
(depending on traffic) n2d-standard-4 VMs instead of 4 such VMs. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to