[
https://issues.apache.org/jira/browse/AVRO-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15496825#comment-15496825
]
Doug Cutting commented on AVRO-1917:
------------------------------------
Can you please post some code that illustrates this? That way we can better
evaluate how to improve things. Ideally this would be a unit test with a
custom DatumReader that rejects data with certain criteria. Thanks!
> DataFileStream Skips Blocks with hasNext and nextBlock calls
> ------------------------------------------------------------
>
> Key: AVRO-1917
> URL: https://issues.apache.org/jira/browse/AVRO-1917
> Project: Avro
> Issue Type: Bug
> Components: java
> Reporter: Michael Coon
>
> We have a situation where there are potentially large segments of data
> embedded in an Avro data item. Sometimes, an upstream system will become
> corrupted and add hundreds of thousands of array items in the structure. When
> I try to read the item as a Datum record, it blows the heap immediately.
> To catch this situation, I needed to create a custom DatumReader that checked
> the size of arrays and byte[] and if exceeding a threshold, throws a custom
> exception that I detect and skip the corrupted item in the file. However, to
> accomplish the try-catch-skip functionality, I had to use a hasNext, and
> nextBlock to get the ByteBuffer and send to my reader to catch the situation.
> Unfortunately, calling "hasNext" and then "nextBlock" actually skips the
> first block in the underlying data stream. This is because "nextBlock" calls
> "hasNext", which reads the next block. So I called it, then nextBlock called
> it, causing bytes to be skipped. My solution is to do a do...while loop and
> catch "NoSuchElementException", but this is not intuitive and required me to
> review the code to know how to work around it. The fix is to create a
> condition that both hasNext and nextBlock agree so that it doesn't advance
> forward reading the next block in hasNext call.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)