[ 
https://issues.apache.org/jira/browse/AVRO-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wolfgang hoschek updated AVRO-1058:
-----------------------------------

    Fix Version/s: 1.7.0
    
> invalid int encoding with binary format
> ---------------------------------------
>
>                 Key: AVRO-1058
>                 URL: https://issues.apache.org/jira/browse/AVRO-1058
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.6.2, 1.6.3, 1.7.0
>            Reporter: wolfgang hoschek
>             Fix For: 1.7.0
>
>         Attachments: TestRandomRecord.java
>
>
> The java binary format sometimes generates an "invalid int encoding" 
> exception and fails to roundtrip a record even though the json format 
> roundtrips the same record just fine.
> In addition, there is a separate bug in that both binary and JSON format 
> sometimes lead to an infinite loop when read() always returns null and never 
> throws EOFException to indicate end-of-stream. This causes an 
> OutOfMemoryError in the test driver because it forever adds null to a list of 
> records.
> The attached test case java file demonstrates the problems. It walks all 
> *.avsc and *.avpr files in the code base, generates random records based on 
> those schemas, roundtrips the records, and then compares records pre and post 
> roundtrip. To see it fail comment out portions of the following snippet:
> if (roundtripType == RoundtripType.BINARY_AVRO && 
> schemaFile.getName().equals("weather.avsc") && i >= 350) {
>       continue; // FIXME tmp work-around for avro bug (invalid int encoding 
> on large string)
> }
> if (roundtripType == RoundtripType.BINARY_AVRO && 
> schemaFile.getName().equals("Json.avsc") && i >= 1) {
>       continue; // FIXME tmp work-around for avro bug (invalid int encoding 
> on large string)
> }
> if (roundtripType == RoundtripType.BINARY_AVRO && 
> schemaFile.getName().equals("WordCount.avsc") && i >= 2) {
>       continue; // FIXME tmp work-around for avro bug (invalid int encoding 
> on large string)
> }
> if (roundtripType == RoundtripType.BINARY_AVRO && 
> schemaFile.getName().equals("mr_events.avpr") && i >= 0) {
>       continue; // FIXME tmp work-around for avro bug (invalid int encoding 
> on large string)
> }
> if (schemaFile.getName().equals("OnTheClasspath.avsc")) {
>       continue; // FIXME tmp work-around for avro bug (OutOfMemoryError)
> }
> if (schemaFile.getName().equals("OnTheClasspath.avpr")) {
>       continue; // FIXME tmp work-around for avro bug (OutOfMemoryError)
> }
> if (schemaFile.getName().equals("import.avpr")) {
>       continue; // FIXME tmp work-around for avro bug (OutOfMemoryError)
> }
> if (schemaFile.getName().equals("namespaces.avpr")) {
>       continue; // FIXME tmp work-around for avro bug (OutOfMemoryError)
> }
> Finally, there is a third separate issue, which is described in the javadoc 
> for test method fixup():
>       /**
>        * You can trigger Record.equals() failures by modifying RandomData to 
> spit
>        * out Strings rather than Utf8 objects.
>        * 
>        * This hack replaces all occurances of Utf8 objects with String 
> objects in
>        * the given avro record tree. This is sometimes necessary to make
>        * Record.equals() work correctly because Avro deserialization 
> deserializes
>        * String objects as Utf8 objects, and String.equals(Utf8) returns false
>        * even if Utf8.equals(String) would return true.
>        * 
>        * In this particular test scenario this fixup hack might not be 
> necessary
>        * because the RandomData class always generates Utf8 instead of 
> Strings.
>        * 
>        * Nonetheless, perhaps Record.equals() and descendants including Map
>        * equality, etc, should treat any two pairs of String and Utf8 as 
> equal if
>        * string.equals(utf8.toString())). Perhaps Avro internals should 
> arrange to
>        * have the utf8 object always on the left hand side of equality
>        * comparisons, like utf8.equals(obj).
>        */
>       private void fixup(Object obj) { ... }
> To summarize, there are really three separate issues here. I'm submitting 
> them all in one bug report. Feel free to open separate JIRA issues if that's 
> deemed more appropriate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to