Daniel Dai (JIRA)
Mon, 08 Feb 2010 16:59:50 -0800
[
https://issues.apache.org/jira/browse/PIG-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Dai updated PIG-1231:
----------------------------
Status: Patch Available (was: Open)
> DataBagIterator.hasNext() should be idempotent
> ----------------------------------------------
>
> Key: PIG-1231
> URL: https://issues.apache.org/jira/browse/PIG-1231
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.6.0
> Reporter: Daniel Dai
> Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-1231-1.patch
>
>
> DataBagIterator.hasNext() is not repeatable in some situations. This is not
> acceptable cuz the name hasNext() implies that it is idempotent. While
> hasNext() returns true, it is repeatable, but if hasNext() returns false, it
> is not. In BagFormat, we do misuse DataBagIterator.hasNext() because of the
> assumption that hasNext() is always idempotent, which leads to some
> mysterious errors. Here is one error we saw:
> Caused by: java.io.IOException: Stream closed
> at
> java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:189)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
> at java.io.DataInputStream.readByte(DataInputStream.java:248)
> at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:278)
> at
> org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.readFromFile(DefaultDataBag.java:237)
> ... 20 more
> This happens because: we call hasNext(), which reach EOF and we close the
> file. Then we call hasNext() again in the assumption that it is idempotent.
> However, the stream is closed so we get this error message.
> This fix will go to DefaultDataBagIterator, DistinctDataBagIterator,
> CachedBagIterator, SortedDataBagIterator.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.