[
https://issues.apache.org/jira/browse/HADOOP-5266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699195#action_12699195
]
Devaraj Das commented on HADOOP-5266:
-------------------------------------
Some points:
1. Put a comment around IFile.Writer.close() for the keyClass!=null check
add the clear in the MarkableIterator interface
2. A Counter for the number of times values are iterated over would be nice to
have
You probably can improve the implementation of how you write the
firstkeybytes/firstvaluebytes by passing the Serializer the stream
corresponding to the BackupStore as opposed to making a DataOutputBuffer copy
of the bytes. Granted this is happening only for the first key/value bytes
after a mark is called. But maybe it makes sense to keep the implementation
tight if it doesn't mess up the code a lot.
3. Remove values.clear() from the ReduceValuesIterator iteration
4. Task.ValuesIterator.readNextValue should do "nextValueBytes.getLength() -
nextValueBytes.getPosition()" to get the length?
5. The size for the MemoryCache in BackupStore should probably be a fraction of
mapred.job.reduce.input.buffer.percent.
> Values Iterator should support "mark" and "reset"
> -------------------------------------------------
>
> Key: HADOOP-5266
> URL: https://issues.apache.org/jira/browse/HADOOP-5266
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: Jothi Padmanabhan
> Assignee: Jothi Padmanabhan
> Fix For: 0.21.0
>
> Attachments: hadoop-5266-v1.patch
>
>
> Some users have expressed interest in having a mark-reset functionality on
> values iterator. Users can call mark() at any point during the iteration
> process and a subsequent reset() should move the iterator to the last value
> emitted when mark() was called.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.