[
https://issues.apache.org/jira/browse/AVRO-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633747#comment-17633747
]
Oscar Westra van Holthe - Kind commented on AVRO-3524:
------------------------------------------------------
>From what I can see, this is a result of caching via {{WeakIdentityHashMap}}
>in the class {{{}GenericDatumReader{}}}, lines 109-110 and 124-133. This adds
>the newly parsed schema (which includes a {{ConcurrentHashMap}} for each
>field&(sub)schema in the schema structure) to a cache. The cache is used to
>reuse resolutions between writer and reader schemata, which speeds up reading
>– especially when using the single message encoding, e.g. from message
>pipelines.
The objects can be released, but this will only happen when the garbage
collector would otherwise run out of memory.
> Memory leak when not reusing avro schema instance
> -------------------------------------------------
>
> Key: AVRO-3524
> URL: https://issues.apache.org/jira/browse/AVRO-3524
> Project: Apache Avro
> Issue Type: Bug
> Components: java
> Affects Versions: 1.9.2, 1.10.2
> Environment: * openJdk 8
> * tested in Avro 1.9.2 and 1.10.2
> Reporter: Yu-Wu Chu
> Priority: Major
> Attachments: jira-not-share.png, jira-shared.png
>
>
> When deserializing avro record, if we do not use shared schema instance, the
> memory usage start growing as the number of deserializing growth.
> Code with shared schema:
> {code:java}
> public void myTest() throws Exception {
> Schema schema = new Schema.Parser().parse(schemaString);
> final AvroEntity avroEntity = buildAvroEntity();
> final ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
> final BinaryEncoder encoder =
> EncoderFactory.get().binaryEncoder(outputStream, null);
> final DatumWriter<AvroEntity> writer = new SpecificDatumWriter<>(schema);
> writer.write( avroEntity, encoder);
> encoder.flush();
> final byte[] data = outputStream.toByteArray();
> DatumReader<AvroEntity> reader =new SpecificDatumReader<>(schema);
> int count = 0;
> while (count < 100000) {
> final Decoder decoder = DecoderFactory.get().binaryDecoder(data,
> null);
> //final Schema mySchema = new Schema.Parser().parse(schemaString);
> reader.setSchema(schema);
> reader.read(null, decoder);
> count++;
> if (count % 1000 == 0) {
> System.gc();
> System.out.println("test" + count);
> }
> }
> System.out.println("test" + count);
> }{code}
>
> Code without shared schema:
> {code:java}
> public void myTest() throws Exception {
> schema = new Schema.Parser().parse(schemaString);
> final AvroEntity avroEntity = buildAvroEntity();
> final ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
> final BinaryEncoder encoder =
> EncoderFactory.get().binaryEncoder(outputStream, null);
> final DatumWriter<AvroEntity> writer = new SpecificDatumWriter<>(schema);
> writer.write( avroEntity, encoder);
> encoder.flush();
> final byte[] data = outputStream.toByteArray();
> DatumReader<AvroEntity> reader =new SpecificDatumReader<>(schema);
> int count = 0;
> while (count < 100000) {
> final Decoder decoder = DecoderFactory.get().binaryDecoder(data,
> null);
> final Schema mySchema = new Schema.Parser().parse(schemaString);
> reader.setSchema(mySchema);
> reader.read(null, decoder);
> count++;
> if (count % 1000 == 0) {
> System.gc();
> System.out.println("test" + count);
> }
> }
> System.out.println("test" + count);
> }{code}
>
> Number of ConcurrentHashMapNode instances between shared schema and
> not-shared schema are 5,000 vs 1,500,000.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)