[ 
https://issues.apache.org/jira/browse/AVRO-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18077856#comment-18077856
 ] 

Michael Skells commented on AVRO-3524:
--------------------------------------

Hi [~opwvhk] I saw this issue recently, or something similar from the 
FastReaderBuilder cache, wher eI had 100Gb of schema caches (when in readity 
there were one 4 very deep schease currently active. The problem seemed to be 
that one cycle of GC, finalisation and weak cache access (checking the 
WeakReferenceQueue) clears one level of the cached objects, but we had very 
deep schemas, so they we being generated faster that they were being cleared

Thats why I rasied [https://github.com/apache/avro/pull/3746] for the issue 
https://issues.apache.org/jira/browse/AVRO-4249 
[~ywc999] does that fix your issue?

> Memory leak when not reusing avro schema instance
> -------------------------------------------------
>
>                 Key: AVRO-3524
>                 URL: https://issues.apache.org/jira/browse/AVRO-3524
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.9.2, 1.10.2
>         Environment: * openJdk 8
>  * tested in Avro 1.9.2 and 1.10.2
>            Reporter: Yu-Wu Chu
>            Priority: Major
>         Attachments: jira-not-share.png, jira-shared.png
>
>
> When deserializing avro record, if we do not use shared schema instance, the 
> memory usage start growing as the number of deserializing growth.
> Code with shared schema:
> {code:java}
> public void myTest() throws Exception {
>     Schema schema = new Schema.Parser().parse(schemaString);
>     final AvroEntity avroEntity = buildAvroEntity();
>     final ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
>     final BinaryEncoder encoder = 
> EncoderFactory.get().binaryEncoder(outputStream, null);
>     final DatumWriter<AvroEntity> writer = new SpecificDatumWriter<>(schema);
>     writer.write( avroEntity, encoder);
>     encoder.flush();
>     final byte[] data = outputStream.toByteArray();
>     DatumReader<AvroEntity> reader =new SpecificDatumReader<>(schema);
>     int count = 0;
>     while (count < 100000) {
>         final Decoder decoder = DecoderFactory.get().binaryDecoder(data, 
> null);
>         //final Schema mySchema = new Schema.Parser().parse(schemaString);
>         reader.setSchema(schema);
>         reader.read(null, decoder);
>         count++;
>         if (count % 1000 == 0) {
>             System.gc();
>             System.out.println("test" + count);
>         }
>     }
>     System.out.println("test" + count);
> }{code}
>  
> Code without shared schema:
> {code:java}
> public void myTest() throws Exception {
>     schema = new Schema.Parser().parse(schemaString);
>     final AvroEntity avroEntity = buildAvroEntity();
>     final ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
>     final BinaryEncoder encoder = 
> EncoderFactory.get().binaryEncoder(outputStream, null);
>     final DatumWriter<AvroEntity> writer = new SpecificDatumWriter<>(schema);
>     writer.write( avroEntity, encoder);
>     encoder.flush();
>     final byte[] data = outputStream.toByteArray();
>     DatumReader<AvroEntity> reader =new SpecificDatumReader<>(schema);
>     int count = 0;
>     while (count < 100000) {
>         final Decoder decoder = DecoderFactory.get().binaryDecoder(data, 
> null);
>         final Schema mySchema = new Schema.Parser().parse(schemaString);
>         reader.setSchema(mySchema);
>         reader.read(null, decoder);
>         count++;
>         if (count % 1000 == 0) {
>             System.gc();
>             System.out.println("test" + count);
>         }
>     }
>     System.out.println("test" + count);
> }{code}
>  
> Number of ConcurrentHashMapNode instances between shared schema and 
> not-shared schema are 5,000 vs 1,500,000.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to