[
https://issues.apache.org/jira/browse/AVRO-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
wolfgang hoschek updated AVRO-1058:
-----------------------------------
Fix Version/s: 1.7.0
> invalid int encoding with binary format
> ---------------------------------------
>
> Key: AVRO-1058
> URL: https://issues.apache.org/jira/browse/AVRO-1058
> Project: Avro
> Issue Type: Bug
> Components: java
> Affects Versions: 1.6.2, 1.6.3, 1.7.0
> Reporter: wolfgang hoschek
> Fix For: 1.7.0
>
> Attachments: TestRandomRecord.java
>
>
> The java binary format sometimes generates an "invalid int encoding"
> exception and fails to roundtrip a record even though the json format
> roundtrips the same record just fine.
> In addition, there is a separate bug in that both binary and JSON format
> sometimes lead to an infinite loop when read() always returns null and never
> throws EOFException to indicate end-of-stream. This causes an
> OutOfMemoryError in the test driver because it forever adds null to a list of
> records.
> The attached test case java file demonstrates the problems. It walks all
> *.avsc and *.avpr files in the code base, generates random records based on
> those schemas, roundtrips the records, and then compares records pre and post
> roundtrip. To see it fail comment out portions of the following snippet:
> if (roundtripType == RoundtripType.BINARY_AVRO &&
> schemaFile.getName().equals("weather.avsc") && i >= 350) {
> continue; // FIXME tmp work-around for avro bug (invalid int encoding
> on large string)
> }
> if (roundtripType == RoundtripType.BINARY_AVRO &&
> schemaFile.getName().equals("Json.avsc") && i >= 1) {
> continue; // FIXME tmp work-around for avro bug (invalid int encoding
> on large string)
> }
> if (roundtripType == RoundtripType.BINARY_AVRO &&
> schemaFile.getName().equals("WordCount.avsc") && i >= 2) {
> continue; // FIXME tmp work-around for avro bug (invalid int encoding
> on large string)
> }
> if (roundtripType == RoundtripType.BINARY_AVRO &&
> schemaFile.getName().equals("mr_events.avpr") && i >= 0) {
> continue; // FIXME tmp work-around for avro bug (invalid int encoding
> on large string)
> }
> if (schemaFile.getName().equals("OnTheClasspath.avsc")) {
> continue; // FIXME tmp work-around for avro bug (OutOfMemoryError)
> }
> if (schemaFile.getName().equals("OnTheClasspath.avpr")) {
> continue; // FIXME tmp work-around for avro bug (OutOfMemoryError)
> }
> if (schemaFile.getName().equals("import.avpr")) {
> continue; // FIXME tmp work-around for avro bug (OutOfMemoryError)
> }
> if (schemaFile.getName().equals("namespaces.avpr")) {
> continue; // FIXME tmp work-around for avro bug (OutOfMemoryError)
> }
> Finally, there is a third separate issue, which is described in the javadoc
> for test method fixup():
> /**
> * You can trigger Record.equals() failures by modifying RandomData to
> spit
> * out Strings rather than Utf8 objects.
> *
> * This hack replaces all occurances of Utf8 objects with String
> objects in
> * the given avro record tree. This is sometimes necessary to make
> * Record.equals() work correctly because Avro deserialization
> deserializes
> * String objects as Utf8 objects, and String.equals(Utf8) returns false
> * even if Utf8.equals(String) would return true.
> *
> * In this particular test scenario this fixup hack might not be
> necessary
> * because the RandomData class always generates Utf8 instead of
> Strings.
> *
> * Nonetheless, perhaps Record.equals() and descendants including Map
> * equality, etc, should treat any two pairs of String and Utf8 as
> equal if
> * string.equals(utf8.toString())). Perhaps Avro internals should
> arrange to
> * have the utf8 object always on the left hand side of equality
> * comparisons, like utf8.equals(obj).
> */
> private void fixup(Object obj) { ... }
> To summarize, there are really three separate issues here. I'm submitting
> them all in one bug report. Feel free to open separate JIRA issues if that's
> deemed more appropriate.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira