[
https://issues.apache.org/jira/browse/HBASE-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837309#action_12837309
]
Andrew Purtell commented on HBASE-2251:
---------------------------------------
A zipf law distribution is good for simulating web sourced content. We run
internal performance benchmarks based on that. So +1 on that notion.
We should also include runs with all data items as serialized longs, another
use case that will be common I would expect. I think this is what Ryan was
getting at.
Also while we're here, I have a wish that PE had a mode where if given no
arguments other than number of clients performs the full suite of performance
tests and dumps the result as plain text and also as XML if a command line flag
toggles it. Then I can write a Hudson plugin that fails a build if performance
is out of line beyond some threshold. What do you think?
> PE defaults to 1k rows - uncommon use case, and easy to hit benchmarks
> ----------------------------------------------------------------------
>
> Key: HBASE-2251
> URL: https://issues.apache.org/jira/browse/HBASE-2251
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: ryan rawson
> Fix For: 0.20.4, 0.21.0
>
>
> The PerformanceEvaluation uses 1k rows, which I would argue is uncommon, and
> also provides an easy to hit performance goal. Most of the harder
> performance issues happens at the low and high side of cell size. In our own
> application, our key sizes range from 4 bytes to maybe 100 bytes. Very
> rarely 1000 bytes. If we have large values, they are VERY large, like
> multiple k sizes.
> Recently a change went into HBase that ran well with PE because the overhead
> of 1k rows is very low in memory, but under small rows, the expected
> performance would be hit much more. This is because the per-value overhead
> (eg: node objects of the skip list/memstore) is amortized more with 1k
> values.
> We should make this a tunable setting, and have a low default. I would argue
> for a 10-30 byte default.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.