[
https://issues.apache.org/jira/browse/HADOOP-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sharad Agarwal updated HADOOP-5369:
-----------------------------------
Resolution: Fixed
Status: Resolved (was: Patch Available)
I committed this. Thanks Ben!
> Small tweaks to reduce MapFile index size
> -----------------------------------------
>
> Key: HADOOP-5369
> URL: https://issues.apache.org/jira/browse/HADOOP-5369
> Project: Hadoop Core
> Issue Type: Improvement
> Reporter: Ben Maurer
> Assignee: Ben Maurer
> Fix For: 0.21.0
>
> Attachments: mapfile.patch, smaller_mapfile.patch,
> smaller_mapfile.patch, smaller_mapfile.patch, smaller_mapfile.patch,
> smaller_mapfile.patch
>
>
> Two minor tweaks can help reduce the memory overhead of the MapFile index a
> bit:
> 1) Because the index file is a sequence file, it's length is not known. That
> means the index is built using the standard "mulitply the size of the buffer
> on overflow" with a factor of 3/2. With small keys, the slack in the index
> can be substantial. This patch has a constant upper bound on the amount of
> slack allowed.
> 2) In block compressed map files the index file often has entries with the
> same offset (because the compressed block had more than index interval keys).
> The entries with identical offsets do not help MapFile do random access any
> faster. This patch eliminates these types of entries from new map files, and
> ignores them while reading old map files. This patch greatly helped with
> memory usage on a compressed hbase table.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.