[jira] Commented: (HADOOP-5266) Values Iterator should support "mark" and "reset"

Devaraj Das (JIRA) Wed, 15 Apr 2009 06:51:38 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-5266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699195#action_12699195
 ]


Devaraj Das commented on HADOOP-5266:
-------------------------------------

Some points: 
1. Put a comment around IFile.Writer.close() for the keyClass!=null check
add the clear in the MarkableIterator interface
2. A Counter for the number of times values are iterated over would be nice to 
have
You probably can improve the implementation of how you write the 
firstkeybytes/firstvaluebytes by passing the Serializer the stream 
corresponding to the BackupStore as opposed to making a DataOutputBuffer copy 
of the bytes. Granted this is happening only for the first key/value bytes 
after a mark is called. But maybe it makes sense to keep the implementation 
tight if it doesn't mess up the code a lot.
3. Remove values.clear() from the ReduceValuesIterator iteration
4. Task.ValuesIterator.readNextValue should do "nextValueBytes.getLength() - 
nextValueBytes.getPosition()" to get the length?
5. The size for the MemoryCache in BackupStore should probably be a fraction of 
mapred.job.reduce.input.buffer.percent.

> Values Iterator should support "mark" and "reset"
> -------------------------------------------------
>
>                 Key: HADOOP-5266
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5266
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Jothi Padmanabhan
>            Assignee: Jothi Padmanabhan
>             Fix For: 0.21.0
>
>         Attachments: hadoop-5266-v1.patch
>
>
> Some users have expressed interest in having a mark-reset functionality on 
> values iterator. Users can call mark() at any point during the iteration 
> process and a subsequent reset() should move the iterator to the last value 
> emitted when mark() was called. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5266) Values Iterator should support "mark" and "reset"

Reply via email to