[ 
https://issues.apache.org/jira/browse/HBASE-15248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16106811#comment-16106811
 ] 

Anoop Sam John commented on HBASE-15248:
----------------------------------------

This block header we know it is 33 bytes and there will be 13 bytes meta data.  
And the CRCs we know(?) how many bytes? May be yes.
These are all extra other than the block size what a user configured (4KB).  
But we know how many bytes extra.
The biggest issue is our block size is NOT an upper cap. This is not the max 
size a block will have. This is the min size a block will have.  The check for 
ending cur block and starting new is with out considering cur cell size. Also 
one more thing is we dont want to write duplicate cells (Same key but diff mvcc 
numbers) in diff blocks.  IMO this the main concern area for making sure the 
block size is very predictable. Because of this unpredictability what we do is 
all block sizes user will end up adding some extra size.  Remember the default 
values for BC block sizes as (5 KB, 9 KB etc) for block sizes 4KB, 8 KB.  Extra 
1 KB per block is considering all these extra things.

> BLOCKSIZE 4k should result in 4096 bytes on disk; i.e. fit inside a 
> BucketCache 'block' of 4k
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-15248
>                 URL: https://issues.apache.org/jira/browse/HBASE-15248
>             Project: HBase
>          Issue Type: Sub-task
>          Components: BucketCache
>            Reporter: stack
>
> Chatting w/ a gentleman named Daniel Pol who is messing w/ bucketcache, he 
> wants blocks to be the size specified in the configuration and no bigger. His 
> hardware set ups fetches pages of 4k and so a block that has 4k of payload 
> but has then a header and the header of the next block (which helps figure 
> whats next when scanning) ends up being 4203 bytes or something, and this 
> then then translates into two seeks per block fetch.
> This issue is about what it would take to stay inside our configured size 
> boundary writing out blocks.
> If not possible, give back better signal on what to do so you could fit 
> inside a particular constraint.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to