[ 
https://issues.apache.org/jira/browse/AVRO-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633747#comment-17633747
 ] 

Oscar Westra van Holthe - Kind commented on AVRO-3524:
------------------------------------------------------

>From what I can see, this is a result of caching via {{WeakIdentityHashMap}} 
>in the class {{{}GenericDatumReader{}}}, lines 109-110 and 124-133. This adds 
>the newly parsed schema (which includes a {{ConcurrentHashMap}} for each 
>field&(sub)schema in the schema structure) to a cache. The cache is used to 
>reuse resolutions between writer and reader schemata, which speeds up reading 
>– especially when using the single message encoding, e.g. from message 
>pipelines.

The objects can be released, but this will only happen when the garbage 
collector would otherwise run out of memory.

> Memory leak when not reusing avro schema instance
> -------------------------------------------------
>
>                 Key: AVRO-3524
>                 URL: https://issues.apache.org/jira/browse/AVRO-3524
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.9.2, 1.10.2
>         Environment: * openJdk 8
>  * tested in Avro 1.9.2 and 1.10.2
>            Reporter: Yu-Wu Chu
>            Priority: Major
>         Attachments: jira-not-share.png, jira-shared.png
>
>
> When deserializing avro record, if we do not use shared schema instance, the 
> memory usage start growing as the number of deserializing growth.
> Code with shared schema:
> {code:java}
> public void myTest() throws Exception {
>     Schema schema = new Schema.Parser().parse(schemaString);
>     final AvroEntity avroEntity = buildAvroEntity();
>     final ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
>     final BinaryEncoder encoder = 
> EncoderFactory.get().binaryEncoder(outputStream, null);
>     final DatumWriter<AvroEntity> writer = new SpecificDatumWriter<>(schema);
>     writer.write( avroEntity, encoder);
>     encoder.flush();
>     final byte[] data = outputStream.toByteArray();
>     DatumReader<AvroEntity> reader =new SpecificDatumReader<>(schema);
>     int count = 0;
>     while (count < 100000) {
>         final Decoder decoder = DecoderFactory.get().binaryDecoder(data, 
> null);
>         //final Schema mySchema = new Schema.Parser().parse(schemaString);
>         reader.setSchema(schema);
>         reader.read(null, decoder);
>         count++;
>         if (count % 1000 == 0) {
>             System.gc();
>             System.out.println("test" + count);
>         }
>     }
>     System.out.println("test" + count);
> }{code}
>  
> Code without shared schema:
> {code:java}
> public void myTest() throws Exception {
>     schema = new Schema.Parser().parse(schemaString);
>     final AvroEntity avroEntity = buildAvroEntity();
>     final ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
>     final BinaryEncoder encoder = 
> EncoderFactory.get().binaryEncoder(outputStream, null);
>     final DatumWriter<AvroEntity> writer = new SpecificDatumWriter<>(schema);
>     writer.write( avroEntity, encoder);
>     encoder.flush();
>     final byte[] data = outputStream.toByteArray();
>     DatumReader<AvroEntity> reader =new SpecificDatumReader<>(schema);
>     int count = 0;
>     while (count < 100000) {
>         final Decoder decoder = DecoderFactory.get().binaryDecoder(data, 
> null);
>         final Schema mySchema = new Schema.Parser().parse(schemaString);
>         reader.setSchema(mySchema);
>         reader.read(null, decoder);
>         count++;
>         if (count % 1000 == 0) {
>             System.gc();
>             System.out.println("test" + count);
>         }
>     }
>     System.out.println("test" + count);
> }{code}
>  
> Number of ConcurrentHashMapNode instances between shared schema and 
> not-shared schema are 5,000 vs 1,500,000.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to