[ https://issues.apache.org/jira/browse/HBASE-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837464#action_12837464 ]
ryan rawson commented on HBASE-2251: ------------------------------------ zipf is ok, but it may not accurate represent common data use patterns. What I am trying to say here is that big cells represent one scaling challenge, and small cells a different one. Users often have one or the other, but not a whole lot inbetween. Our systems use either small cells or huge ones ( > 2k). The small cells place a higher load, one specific example being the node objects in the memstore kvset. This is what was causing the clone issues. hence we need to accurately simulate objects from the 1-50ish byte size area, and the 1000-12000 (or larger) byte size area. Using a zipf distribution in each thereof would be reasonable I think. > PE defaults to 1k rows - uncommon use case, and easy to hit benchmarks > ---------------------------------------------------------------------- > > Key: HBASE-2251 > URL: https://issues.apache.org/jira/browse/HBASE-2251 > Project: Hadoop HBase > Issue Type: Bug > Reporter: ryan rawson > Fix For: 0.20.4, 0.21.0 > > > The PerformanceEvaluation uses 1k rows, which I would argue is uncommon, and > also provides an easy to hit performance goal. Most of the harder > performance issues happens at the low and high side of cell size. In our own > application, our key sizes range from 4 bytes to maybe 100 bytes. Very > rarely 1000 bytes. If we have large values, they are VERY large, like > multiple k sizes. > Recently a change went into HBase that ran well with PE because the overhead > of 1k rows is very low in memory, but under small rows, the expected > performance would be hit much more. This is because the per-value overhead > (eg: node objects of the skip list/memstore) is amortized more with 1k > values. > We should make this a tunable setting, and have a low default. I would argue > for a 10-30 byte default. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.