[
https://issues.apache.org/jira/browse/HBASE-4163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079153#comment-13079153
]
Nicolas Spiegelberg commented on HBASE-4163:
--------------------------------------------
My initial thought is to use the existing RegionSplitter utility. We just need
to create a custom SplitAlgorithm implementation class for the YCSB key
specification & tell the users to run:
{code}
bin/hbase org.apache.hadoop.hbase.util.RegionSplitter TABLE -c 200 -f FAMILY -D
split.algorithm=YcsbSplit
{code}
to pre-create a table with 200 regions. To not split, we can either set
hbase.hregion.max.filesize to a really high value or add a per-table split
config option.
> Create Split Strategy for YCSB Benchmark
> ----------------------------------------
>
> Key: HBASE-4163
> URL: https://issues.apache.org/jira/browse/HBASE-4163
> Project: HBase
> Issue Type: Improvement
> Components: util
> Affects Versions: 0.90.3, 0.92.0
> Reporter: Nicolas Spiegelberg
> Assignee: Lars George
> Priority: Minor
> Labels: benchmark
>
> Talked with Lars about how we can make it easier for users to run the YCSB
> benchmarks against HBase & get realistic results. Currently, HBase is
> optimized for the random/uniform read/write case, which is the YCSB load.
> The initial reason why we perform bad when users test against us is because
> they do not presplit regions & have the split ratio really low. We need a
> one-line way for a user to create a table that is pre-split to 200 regions
> (or some decent number) by default & disable splitting. Realistically, this
> is how a uniform load cluster should scale, so it's not a hack. This will
> also give us a good use case to point to for how users should pre-split
> regions.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira