[
https://issues.apache.org/jira/browse/AVRO-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18077856#comment-18077856
]
Michael Skells commented on AVRO-3524:
--------------------------------------
Hi [~opwvhk] I saw this issue recently, or something similar from the
FastReaderBuilder cache, wher eI had 100Gb of schema caches (when in readity
there were one 4 very deep schease currently active. The problem seemed to be
that one cycle of GC, finalisation and weak cache access (checking the
WeakReferenceQueue) clears one level of the cached objects, but we had very
deep schemas, so they we being generated faster that they were being cleared
Thats why I rasied [https://github.com/apache/avro/pull/3746] for the issue
https://issues.apache.org/jira/browse/AVRO-4249
[~ywc999] does that fix your issue?
> Memory leak when not reusing avro schema instance
> -------------------------------------------------
>
> Key: AVRO-3524
> URL: https://issues.apache.org/jira/browse/AVRO-3524
> Project: Apache Avro
> Issue Type: Bug
> Components: java
> Affects Versions: 1.9.2, 1.10.2
> Environment: * openJdk 8
> * tested in Avro 1.9.2 and 1.10.2
> Reporter: Yu-Wu Chu
> Priority: Major
> Attachments: jira-not-share.png, jira-shared.png
>
>
> When deserializing avro record, if we do not use shared schema instance, the
> memory usage start growing as the number of deserializing growth.
> Code with shared schema:
> {code:java}
> public void myTest() throws Exception {
> Schema schema = new Schema.Parser().parse(schemaString);
> final AvroEntity avroEntity = buildAvroEntity();
> final ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
> final BinaryEncoder encoder =
> EncoderFactory.get().binaryEncoder(outputStream, null);
> final DatumWriter<AvroEntity> writer = new SpecificDatumWriter<>(schema);
> writer.write( avroEntity, encoder);
> encoder.flush();
> final byte[] data = outputStream.toByteArray();
> DatumReader<AvroEntity> reader =new SpecificDatumReader<>(schema);
> int count = 0;
> while (count < 100000) {
> final Decoder decoder = DecoderFactory.get().binaryDecoder(data,
> null);
> //final Schema mySchema = new Schema.Parser().parse(schemaString);
> reader.setSchema(schema);
> reader.read(null, decoder);
> count++;
> if (count % 1000 == 0) {
> System.gc();
> System.out.println("test" + count);
> }
> }
> System.out.println("test" + count);
> }{code}
>
> Code without shared schema:
> {code:java}
> public void myTest() throws Exception {
> schema = new Schema.Parser().parse(schemaString);
> final AvroEntity avroEntity = buildAvroEntity();
> final ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
> final BinaryEncoder encoder =
> EncoderFactory.get().binaryEncoder(outputStream, null);
> final DatumWriter<AvroEntity> writer = new SpecificDatumWriter<>(schema);
> writer.write( avroEntity, encoder);
> encoder.flush();
> final byte[] data = outputStream.toByteArray();
> DatumReader<AvroEntity> reader =new SpecificDatumReader<>(schema);
> int count = 0;
> while (count < 100000) {
> final Decoder decoder = DecoderFactory.get().binaryDecoder(data,
> null);
> final Schema mySchema = new Schema.Parser().parse(schemaString);
> reader.setSchema(mySchema);
> reader.read(null, decoder);
> count++;
> if (count % 1000 == 0) {
> System.gc();
> System.out.println("test" + count);
> }
> }
> System.out.println("test" + count);
> }{code}
>
> Number of ConcurrentHashMapNode instances between shared schema and
> not-shared schema are 5,000 vs 1,500,000.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)