Doug Cutting commented on AVRO-1917:

Can you please post some code that illustrates this?  That way we can better 
evaluate how to improve things.  Ideally this would be a unit test with a 
custom DatumReader that rejects data with certain criteria.  Thanks!

> DataFileStream Skips Blocks with hasNext and nextBlock calls
> ------------------------------------------------------------
>                 Key: AVRO-1917
>                 URL: https://issues.apache.org/jira/browse/AVRO-1917
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>            Reporter: Michael Coon
> We have a situation where there are potentially large segments of data 
> embedded in an Avro data item. Sometimes, an upstream system will become 
> corrupted and add hundreds of thousands of array items in the structure. When 
> I try to read the item as a Datum record, it blows the heap immediately. 
> To catch this situation, I needed to create a custom DatumReader that checked 
> the size of arrays and byte[] and if exceeding a threshold, throws a custom 
> exception that I detect and skip the corrupted item in the file. However, to 
> accomplish the try-catch-skip functionality, I had to use a hasNext, and 
> nextBlock to get the ByteBuffer and send to my reader to catch the situation. 
> Unfortunately, calling "hasNext" and then "nextBlock" actually skips the 
> first block in the underlying data stream. This is because "nextBlock" calls 
> "hasNext", which reads the next block. So I called it, then nextBlock called 
> it, causing bytes to be skipped. My solution is to do a do...while loop and 
> catch "NoSuchElementException", but this is not intuitive and required me to 
> review the code to know how to work around it. The fix is to create a 
> condition that both hasNext and nextBlock agree so that it doesn't advance 
> forward reading the next block in hasNext call.

This message was sent by Atlassian JIRA

Reply via email to