[ 
https://issues.apache.org/jira/browse/HADOOP-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631694#action_12631694
 ] 

Chris Douglas commented on HADOOP-3638:
---------------------------------------

* MapTask
** In getIndexInformation and writeSingleSpillIndexToFile, ArrayList::get 
doesn't return null, it throws IndexOutOfBoundsException when the index is past 
the last element. Both should use a size check instead.
** I don't understand the way the indexFileName path array is used in 
mergeParts. Its contents aren't initialized, but it's dereferenced in a debug 
statement and initialized with quasi-meaningful data as the method exits. At 
the start of mergeParts, indexCacheList.size() and numSpills should be 
sufficient to know which spills hit disk (which would also permit the rename 
logic to be moved from writeSingleSpillIndexToFile). After spill indices are 
read into memory, they can immediately be deleted from disk. Cleanup doesn't 
need to happen at the end of mergeParts
** The IndexRecord arrays are not needed after the segment list is built and 
can be nulled for garbage collection
* IndexRecord
** Re-thrown exceptions should include their cause, but since EOFException is 
an IOException, it should probably just be let out of readIndexFile.
* IndexCache
** Discussed offline

> Cache the iFile index files in memory to reduce seeks during map output 
> serving
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-3638
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3638
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.17.0
>            Reporter: Devaraj Das
>            Assignee: Jothi Padmanabhan
>             Fix For: 0.19.0
>
>         Attachments: hadoop-3638-v1.patch, hadoop-3638-v2.patch, 
> hadoop-3638-v3.patch, hadoop-3638-v4.patch
>
>
> The iFile index files can be cached in memory to reduce seeks during map 
> output serving.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to