invalid int encoding with binary format
---------------------------------------

                 Key: AVRO-1058
                 URL: https://issues.apache.org/jira/browse/AVRO-1058
             Project: Avro
          Issue Type: Bug
          Components: java
    Affects Versions: 1.6.3, 1.6.2
            Reporter: wolfgang hoschek


The java binary format sometimes generates an "invalid int encoding" exception 
and fails to roundtrip a record even though the json format roundtrips the same 
record just fine.

In addition, there is a separate bug in that both binary and JSON format 
sometimes lead to an infinite loop when read() always returns null and never 
throws EOFException to indicate end-of-stream. This causes an OutOfMemoryError 
in the test driver because it forever adds null to a list of records.

The attached test case java file demonstrates the problems. It walks all *.avsc 
and *.avpr files in the code base, generates random records based on those 
schemas, roundtrips the records, and then compares records pre and post 
roundtrip. To see it fail comment out portions of the following snippet:

if (roundtripType == RoundtripType.BINARY_AVRO && 
schemaFile.getName().equals("weather.avsc") && i >= 350) {
        continue; // FIXME tmp work-around for avro bug (invalid int encoding 
on large string)
}
if (roundtripType == RoundtripType.BINARY_AVRO && 
schemaFile.getName().equals("Json.avsc") && i >= 1) {
        continue; // FIXME tmp work-around for avro bug (invalid int encoding 
on large string)
}
if (roundtripType == RoundtripType.BINARY_AVRO && 
schemaFile.getName().equals("WordCount.avsc") && i >= 2) {
        continue; // FIXME tmp work-around for avro bug (invalid int encoding 
on large string)
}
if (roundtripType == RoundtripType.BINARY_AVRO && 
schemaFile.getName().equals("mr_events.avpr") && i >= 0) {
        continue; // FIXME tmp work-around for avro bug (invalid int encoding 
on large string)
}
if (schemaFile.getName().equals("OnTheClasspath.avsc")) {
        continue; // FIXME tmp work-around for avro bug (OutOfMemoryError)
}
if (schemaFile.getName().equals("OnTheClasspath.avpr")) {
        continue; // FIXME tmp work-around for avro bug (OutOfMemoryError)
}
if (schemaFile.getName().equals("import.avpr")) {
        continue; // FIXME tmp work-around for avro bug (OutOfMemoryError)
}
if (schemaFile.getName().equals("namespaces.avpr")) {
        continue; // FIXME tmp work-around for avro bug (OutOfMemoryError)
}

Finally, there is a third separate issue, which is described in the javadoc for 
test method fixup():

        /**
         * You can trigger Record.equals() failures by modifying RandomData to 
spit
         * out Strings rather than Utf8 objects.
         * 
         * This hack replaces all occurances of Utf8 objects with String 
objects in
         * the given avro record tree. This is sometimes necessary to make
         * Record.equals() work correctly because Avro deserialization 
deserializes
         * String objects as Utf8 objects, and String.equals(Utf8) returns false
         * even if Utf8.equals(String) would return true.
         * 
         * In this particular test scenario this fixup hack might not be 
necessary
         * because the RandomData class always generates Utf8 instead of 
Strings.
         * 
         * Nonetheless, perhaps Record.equals() and descendants including Map
         * equality, etc, should treat any two pairs of String and Utf8 as 
equal if
         * string.equals(utf8.toString())). Perhaps Avro internals should 
arrange to
         * have the utf8 object always on the left hand side of equality
         * comparisons, like utf8.equals(obj).
         */
        private void fixup(Object obj) { ... }


To summarize, there are really three separate issues here. I'm submitting them 
all in one bug report. Feel free to open separate JIRA issues if that's deemed 
more appropriate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to