[
https://issues.apache.org/jira/browse/HBASE-19506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309415#comment-16309415
]
Anastasia Braginsky commented on HBASE-19506:
---------------------------------------------
Now we come back to this idea. Looking deeper into details, the size of
cell-representation is 20Bytes, the chunk size is 2MB (2097152Bytes), therefore
one chunk can hold representations of 104857.6 cells.
How much cells are inserted before in-memory flush, very depends on the
workload. However, seeking for some average, let's say cell size is 1KB and we
flush in-memory every 12.8MB (10% out of 128MB), thus 12.8MB/1KB=12.8KB ~=
12800 cells are written (in this case).
After that each 5 immutable segments in pipeline are compacted, so 5
under-utilized index chunks are released, and one index chunk with about 52800
cell-representations is allocated (which is about half-capacity). So looks like
indeed there is some under utilization of index chunks, however the index
chunks are at most 5 per memstore, so this impact can be not so significant.
As for solution, we suggest to create another pool for "small" chunks in
ChunkCreator. Let's say chunks of 256KB size. It means we will need to define
also new type of chunks. But it is very important to avoid on-demand
allocation. This "small-chunks" pool can be pre-allocated and its chunks can be
reused.
> Support variable sized chunks from ChunkCreator
> -----------------------------------------------
>
> Key: HBASE-19506
> URL: https://issues.apache.org/jira/browse/HBASE-19506
> Project: HBase
> Issue Type: Sub-task
> Reporter: Anastasia Braginsky
>
> When CellChunkMap is created it allocates a special index chunk (or chunks)
> where array of cell-representations is stored. When the number of
> cell-representations is small, it is preferable to allocate a chunk smaller
> than a default value which is 2MB.
> On the other hand, those "non-standard size" chunks can not be used in pool.
> On-demand allocations in off-heap are costly. So this JIRA is about to
> investigate the trade of between memory usage and the final performance.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)