[ 
https://issues.apache.org/jira/browse/HBASE-9553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770389#comment-13770389
 ] 

Matt Corgan commented on HBASE-9553:
------------------------------------

I don't know the code-level implementation details of any of the garbage 
collectors, but I imagine they do this to an extent already by dividing the 
heap into regions of different chunk sizes and placing blocks into slightly 
bigger slots than they need, effectively doing the padding by leaving empty 
space after each block.  Maybe not for tiny objects, but possibly for bigger 
ones.

I also worry it would be hard to pick a single size to round all the blocks to 
because hbase allows configurable block size and encoding per table.  And even 
if all tables use the default block size and encoding, the encoding will result 
in different block sizes depending on the nature of the data in each table.

It would be a good question for the Mechanical Sympathy mailing list.
                
> Pad HFile blocks to a fixed size before placing them into the blockcache
> ------------------------------------------------------------------------
>
>                 Key: HBASE-9553
>                 URL: https://issues.apache.org/jira/browse/HBASE-9553
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>
> In order to make it easy on the garbage collector and to avoid full 
> compaction phases we should make sure that all (or at least a large 
> percentage) of the HFile blocks as cached in the block cache are exactly the 
> same size.
> Currently an HFile block is typically slightly larger than the declared block 
> size, as the block will accommodate that last KV on the block. The padding 
> would be a ColumnFamily option. In many cases 100 bytes would probably be a 
> good value to make all blocks exactly the same size (but of course it depends 
> on the max size of the KVs).
> This does not have to be perfect. The more blocks evicted and replaced in the 
> block cache are of the exact same size the easier it should be on the GC.
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to