[ https://issues.apache.org/jira/browse/HADOOP-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12710274#action_12710274 ]
Sharad Agarwal commented on HADOOP-5369: ---------------------------------------- The tests failures are due to HADOOP-5847 > Small tweaks to reduce MapFile index size > ----------------------------------------- > > Key: HADOOP-5369 > URL: https://issues.apache.org/jira/browse/HADOOP-5369 > Project: Hadoop Core > Issue Type: Improvement > Reporter: Ben Maurer > Assignee: Ben Maurer > Fix For: 0.21.0 > > Attachments: mapfile.patch, smaller_mapfile.patch, > smaller_mapfile.patch, smaller_mapfile.patch, smaller_mapfile.patch, > smaller_mapfile.patch > > > Two minor tweaks can help reduce the memory overhead of the MapFile index a > bit: > 1) Because the index file is a sequence file, it's length is not known. That > means the index is built using the standard "mulitply the size of the buffer > on overflow" with a factor of 3/2. With small keys, the slack in the index > can be substantial. This patch has a constant upper bound on the amount of > slack allowed. > 2) In block compressed map files the index file often has entries with the > same offset (because the compressed block had more than index interval keys). > The entries with identical offsets do not help MapFile do random access any > faster. This patch eliminates these types of entries from new map files, and > ignores them while reading old map files. This patch greatly helped with > memory usage on a compressed hbase table. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.