Small tweaks to reduce MapFile index size
-----------------------------------------
Key: HADOOP-5369
URL: https://issues.apache.org/jira/browse/HADOOP-5369
Project: Hadoop Core
Issue Type: Improvement
Reporter: Ben Maurer
Attachments: smaller_mapfile.patch
Two minor tweaks can help reduce the memory overhead of the MapFile index a bit:
1) Because the index file is a sequence file, it's length is not known. That
means the index is built using the standard "mulitply the size of the buffer on
overflow" with a factor of 3/2. With small keys, the slack in the index can be
substantial. This patch has a constant upper bound on the amount of slack
allowed.
2) In block compressed map files the index file often has entries with the same
offset (because the compressed block had more than index interval keys). The
entries with identical offsets do not help MapFile do random access any faster.
This patch eliminates these types of entries from new map files, and ignores
them while reading old map files. This patch greatly helped with memory usage
on a compressed hbase table.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.