[ 
https://issues.apache.org/jira/browse/HBASE-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837464#action_12837464
 ] 

ryan rawson commented on HBASE-2251:
------------------------------------

zipf is ok, but it may not accurate represent common data use patterns.

What I am trying to say here is that big cells represent one scaling challenge, 
and small cells a different one.  Users often have one or the other, but not a 
whole lot inbetween.  Our systems use either small cells or huge ones ( > 2k).  
The small cells place a higher load, one specific example being the node 
objects in the memstore kvset.  This is what was causing the clone issues.

hence we need to accurately simulate objects from the 1-50ish byte size area, 
and the 1000-12000 (or larger) byte size area.  Using a zipf distribution in 
each thereof would be reasonable I think.

> PE defaults to 1k rows - uncommon use case, and easy to hit benchmarks
> ----------------------------------------------------------------------
>
>                 Key: HBASE-2251
>                 URL: https://issues.apache.org/jira/browse/HBASE-2251
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: ryan rawson
>             Fix For: 0.20.4, 0.21.0
>
>
> The PerformanceEvaluation uses 1k rows, which I would argue is uncommon, and 
> also provides an easy to hit performance goal.  Most of the harder 
> performance issues happens at the low and high side of cell size.  In our own 
> application, our key sizes range from 4 bytes to maybe 100 bytes.  Very 
> rarely 1000 bytes.  If we have large values, they are VERY large, like 
> multiple k sizes.
> Recently a change went into HBase that ran well with PE because the overhead 
> of 1k rows is very low in memory, but under small rows, the expected 
> performance would be hit much more.  This is because the per-value overhead 
> (eg: node objects of the skip list/memstore) is amortized more with 1k 
> values. 
> We should make this a tunable setting, and have a low default.  I would argue 
> for a 10-30 byte default.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to