[
https://issues.apache.org/jira/browse/HBASE-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cosmin Lehene resolved HBASE-5402.
----------------------------------
Resolution: Later
Closing it with resolution later. It may make sense to a have a deterministic,
invertible hash, in case someone wants to play with it.
> PerformanceEvaluation creates the wrong number of rows in randomWrite
> ---------------------------------------------------------------------
>
> Key: HBASE-5402
> URL: https://issues.apache.org/jira/browse/HBASE-5402
> Project: HBase
> Issue Type: Improvement
> Components: test
> Reporter: Oliver Meyn
> Labels: beginner
>
> The command line 'hbase org.apache.hadoop.hbase.PerformanceEvaluation
> randomWrite 10' should result in a table with 10 * (1024 * 1024) rows (so
> 10485760). Instead what happens is that the randomWrite job reports writing
> that many rows (exactly) but running rowcounter against the table reveals
> only e.g 6549899 rows. A second attempt to build the table produced slightly
> different results (e.g. 6627689). I see a similar discrepancy when using 50
> instead of 10 clients (~35% smaller than expected).
> Further experimentation reveals that the problem is key collision - by
> removing the % totalRows in getRandomRow I saw a reduction in collisions
> (table was ~8M rows instead of 6.6M). Replacing the random row key with
> UUIDs instead of Integers solved the problem and produced exactly 10485760
> rows. But that makes the key size 16 bytes instead of the current 10, so I'm
> not sure that's an acceptable solution.
> Here's the UUID code I used:
> public static byte[] format(final UUID uuid) {
> long msb = uuid.getMostSignificantBits();
> long lsb = uuid.getLeastSignificantBits();
> byte[] buffer = new byte[16];
> for (int i = 0; i < 8; i++) {
> buffer[i] = (byte) (msb >>> 8 * (7 - i));
> }
> for (int i = 8; i < 16; i++) {
> buffer[i] = (byte) (lsb >>> 8 * (7 - i));
> }
> return buffer;
> }
> which is invoked within getRandomRow with
> return format(UUID.randomUUID());
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)