Looking at the ticket (sorry, can¹t update in JIRA at the moment): The value 101 in the raw data is the integer -51.
Therefore the cause is either: * Corrupt data * Improper schema used to read (was written with a different schema than the reader is configured to use for its writer¹ schema) You might want to look around¹ the bytes near that 101 and see if the data looks like it is from a different schema than expected. Given that some other tools can read it, it is likely the latter case the other tools are reading it with a different schema. Note, a reader requires _two_ schemas: The schema that the reader wants to interpret the data as, and the schema that was used when the data was written. If the latter is wrong, this sort of thing can happen as Avro tries to read a variable length item from data that is in the wrong position. You could also see if BinaryDecoder behaves any differently from DirectBinaryDecoder. The issue is most likely above those in the code that uses these (the resolver and/or DatumReader). -Scott On 1/23/16, 10:59 AM, "Yong Zhang" <[email protected]> wrote: >Hi, Avro Developers: > >Is anyone familiar the code logic related to >org.apache.avro.io.DirectBinaryDecoder? > >I am asking this question related to AVRO-1786, which I believe I am >facing a bug related to this class. > >A valid Avro record sent from Mapper to the Reducer, but Reducer cannot >read it due to IndexOutOfBoundException, because the readInt() method of >this class return "-51". > >I even can dump the local variables of the method in this exception case, >and described in the comments area of Jira ticket. > >I don't understand the internal logic of this class, and how the >readInt() method implemented. But an inputstream read 101 bytes out will >cause this method return a negative number, and causes following method >IndexOutofBoundException looks like a bug to me. > >Can anyone understand this class's logic confirm this is a bug or not? If >it is a bug, what is the best way to fix it? > >I can consistently reproduce this bug on our production cluster, which >mean I can verify any code fix works or not for this case. > >Let me know any question related to this JIRA. > >Thanks > >Yong >
