[
https://issues.apache.org/jira/browse/HADOOP-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630497#action_12630497
]
Jothi Padmanabhan commented on HADOOP-3638:
-------------------------------------------
What would be a reasonable amount of memory that can be set aside for the Index
Cache at the Map side?
Each individual record is 24 bytes (3 longs)
Let num reducers = R
Let num map slots = S
Let total number of Spill Files = M
Total Number of Entries per Map task = M*R*24
For a *node*, that is running S Map tasks (slots) at a time, total memory
consumed = S*M*R*24
If M = 100, S = 6, R = 100, then
Total Memory Consumed ~= 1.4M
I think 1.4M is a a very reasonable amount for the index cache, at the node
level.
So, do we need to worry about the memory limit at the map side at all? We could
just LRU at the task tracker level alone.
> Cache the iFile index files in memory to reduce seeks during map output
> serving
> -------------------------------------------------------------------------------
>
> Key: HADOOP-3638
> URL: https://issues.apache.org/jira/browse/HADOOP-3638
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Affects Versions: 0.17.0
> Reporter: Devaraj Das
> Assignee: Jothi Padmanabhan
> Fix For: 0.19.0
>
> Attachments: hadoop-3638-v1.patch, hadoop-3638-v2.patch
>
>
> The iFile index files can be cached in memory to reduce seeks during map
> output serving.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.