Matt Allen created AVRO-1881:
--------------------------------

             Summary: Avro (Java) Memory Leak when reusing JsonDecoder instance
                 Key: AVRO-1881
                 URL: https://issues.apache.org/jira/browse/AVRO-1881
             Project: Avro
          Issue Type: Bug
          Components: java
    Affects Versions: 1.8.1
         Environment: Ubuntu 15.04
Oracle 1.8.0_91 and OpenJDK 1.8.0_45
            Reporter: Matt Allen


{{JsonDecoder}} maintains state for each record decoded, leading to a memory 
leak if the same instance is used for multiple inputs. Using 
{{JsonDecoder.configure}} to change the input does not correctly clean up the 
state stored in {{JsonDecoder.reorderBuffers}}, which leads to an unbounded 
number of {{ReorderBuffer}} instances being accumulated. If a new 
{{JsonDecoder}} is created for each input there is no memory leak, but it is 
significantly more expensive than reusing the same instance.

This problem seems to only occur when the input schema contains a record, which 
is consistent with the {{reorderBuffers}} being the source of the leak. My 
first look at the {{JsonDecoder}} code leads me to believe that the 
{{reorderBuffers}} stack should be empty after a record is fully processed, so 
there may be other behavior at play here.

The following is a minimal example which will exhaust a 50MB heap (-Xmx50m) 
after about 5.25 million iterations. The first section demonstrates that no 
memory leak is encountered when creating a fresh {{JsonDecoder}} instance for 
each input.
{code:title=JsonDecoderMemoryLeak.java|borderStyle=solid}
import org.apache.avro.Schema;
import org.apache.avro.io.*;
import org.apache.avro.generic.*;
import java.io.IOException;

public class JsonDecoderMemoryLeak {
    public static DecoderFactory decoderFactory = DecoderFactory.get();

    public static JsonDecoder createDecoder(String input, Schema schema) throws 
IOException {
        return decoderFactory.jsonDecoder(schema, input);
    }

    public static Object decodeAvro(String input, Schema schema, JsonDecoder 
decoder) throws IOException {
        if (decoder == null) {
            decoder = createDecoder(input, schema);
        } else {
            decoder.configure(input);
        }
        GenericDatumReader reader = new 
GenericDatumReader<GenericRecord>(schema);
        return reader.read(null, decoder);
    }

    public static Schema.Parser parser = new Schema.Parser();
    public static Schema schema = parser.parse("{\"name\": \"TestRecord\", 
\"type\": \"record\", \"fields\": [{\"name\": \"field1\", \"type\": 
\"long\"}]}");

    public static String record(long i) {
        StringBuilder builder = new StringBuilder("{\"field1\": ");
        builder.append(i);
        builder.append("}");
        return builder.toString();
    }


    public static void main(String[] args) throws IOException {
        // No memory issues when creating a new decoder for each record
        System.out.println("Running with fresh JsonDecoder instances for 
6000000 iterations");
        for(long i = 0; i < 6000000; i++) {
            decodeAvro(record(i), schema, null);
        }
        
        // Runs out of memory after ~5250000 records
        System.out.println("Running with a single reused JsonDecoder instance");
        long count = 0;
        try {
            JsonDecoder decoder = createDecoder(record(0), schema);
            while(true) {
                decodeAvro(record(count), schema, decoder);
                count++;
            }
        } catch (OutOfMemoryError e) {
            System.out.println("Out of memory after " + count + " records");
            e.printStackTrace();
        }
    }
}
{code}

{code:title=Output|borderStyle=solid}
$ java -Xmx50m -jar json-decoder-memory-leak.jar 
Running with fresh JsonDecoder instances for 6000000 iterations
Running with a single reused JsonDecoder instance
Out of memory after 5242880 records
java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:3210)
        at java.util.Arrays.copyOf(Arrays.java:3181)
        at java.util.Vector.grow(Vector.java:266)
        at java.util.Vector.ensureCapacityHelper(Vector.java:246)
        at java.util.Vector.addElement(Vector.java:620)
        at java.util.Stack.push(Stack.java:67)
        at org.apache.avro.io.JsonDecoder.doAction(JsonDecoder.java:487)
        at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
        at org.apache.avro.io.JsonDecoder.advance(JsonDecoder.java:139)
        at org.apache.avro.io.JsonDecoder.readLong(JsonDecoder.java:178)
        at 
org.apache.avro.io.ResolvingDecoder.readLong(ResolvingDecoder.java:162)
        at 
org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:183)
        at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
        at 
org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:240)
        at 
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:230)
        at 
org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:174)
        at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
        at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144)
        at com.spiceworks.App.decodeAvro(App.java:25)
        at com.spiceworks.App.main(App.java:52)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to