Daniel Dai (JIRA)
Mon, 08 Feb 2010 11:51:52 -0800
DataBagIterator.hasNext() should be idempotent ----------------------------------------------
Key: PIG-1231
URL: https://issues.apache.org/jira/browse/PIG-1231
Project: Pig
Issue Type: Bug
Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
Fix For: 0.6.0
Current implementation of DataBagIterator.hasNext() will actually fetch the
next tuple every time. So if we call hasNext() consecutively, more than 1
tuples will be fetched. This is confusing cuz the name hasNext() implies that
it is idempotent. In BagFormat, we do misuse DataBagIterator.hasNext() because
of this, which leads to some mysterious errors. Here is one error we saw:
Caused by: java.io.IOException: Stream closed
at
java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:189)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.DataInputStream.readByte(DataInputStream.java:248)
at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:278)
at
org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.readFromFile(DefaultDataBag.java:237)
... 20 more
This happens because: we call hasNext(), which reach EOF and we close the file.
Then we call hasNext() again in the assumption that it is idempotent. However,
the stream is closed so we get this error message.
This fix will go to DefaultDataBagIterator, DistinctDataBagIterator,
CachedBagIterator, SortedDataBagIterator.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.