[ 
https://issues.apache.org/jira/browse/HBASE-29013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-29013.
-------------------------------
    Fix Version/s: 2.7.0
                   3.0.0-beta-2
                   2.5.11
                   2.6.2
     Hadoop Flags: Reviewed
       Resolution: Fixed

Pushed to all active branches.

Thanks [~junegunn] for contributing!

> Make PerformanceEvaluation support larger data sets
> ---------------------------------------------------
>
>                 Key: HBASE-29013
>                 URL: https://issues.apache.org/jira/browse/HBASE-29013
>             Project: HBase
>          Issue Type: Improvement
>          Components: PE
>            Reporter: Junegunn Choi
>            Assignee: Junegunn Choi
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 2.7.0, 3.0.0-beta-2, 2.5.11, 2.6.2
>
>
> The use of 4-byte integers in PerformanceEvaluation can be limiting when you 
> want to test with larger data sets. Suppose you want to generate 10TB of data 
> with the default value size of 1KB, you would need 10G rows.
> {code:java}
> bin/hbase pe --nomapred --presplit=21 --compress=LZ4 --rows=10737418240 
> randomWrite 1
> {code}
> But you can't do it because {{--rows}} expect a number that can be 
> represented with 4 bytes.
> {noformat}
> java.lang.NumberFormatException: For input string: "10737418240"
> {noformat}
> We can instead increase the value size and decrease the number of the rows to 
> circumvent the limitation, but I don't see a good reason to have the 
> limitation in the first place.
> And even if we use a smaller value for {{{}--row{}}}, we can accidentally 
> cause integer overflow as we increase the number of clients.
> {code:java}
> bin/hbase pe --nomapred --compress=LZ4 --rows=1073741824 randomWrite 20
> {code}
> {noformat}
> 2024-12-03T12:21:10,333 INFO  [main {}] hbase.PerformanceEvaluation: Created 
> 20 connections for 20 threads
> 2024-12-03T12:21:10,337 INFO  [TestClient-5 {}] hbase.PerformanceEvaluation: 
> Start class org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest at 
> offset 1073741824 for 1073741824 rows
> 2024-12-03T12:21:10,337 INFO  [TestClient-1 {}] hbase.PerformanceEvaluation: 
> Start class org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest at 
> offset 1073741824 for 1073741824 rows
> 2024-12-03T12:21:10,337 INFO  [TestClient-3 {}] hbase.PerformanceEvaluation: 
> Start class org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest at 
> offset -1073741824 for 1073741824 rows
> 2024-12-03T12:21:10,337 INFO  [TestClient-4 {}] hbase.PerformanceEvaluation: 
> Start class org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest at 
> offset 0 for 1073741824 rows
> 2024-12-03T12:21:10,337 INFO  [TestClient-7 {}] hbase.PerformanceEvaluation: 
> Start class org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest at 
> offset -1073741824 for 1073741824 rows
> 2024-12-03T12:21:10,337 INFO  [TestClient-8 {}] hbase.PerformanceEvaluation: 
> Start class org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest at 
> offset 0 for 1073741824 rows
> ...
> 2024-12-03T12:21:10,338 INFO  [TestClient-17 {}] hbase.PerformanceEvaluation: 
> Sampling 1 every 0 out of 1073741824 total rows.
> 2024-12-03T12:21:10,338 INFO  [TestClient-16 {}] hbase.PerformanceEvaluation: 
> Sampling 1 every 0 out of 1073741824 total rows.
> 2024-12-03T12:21:10,338 INFO  [TestClient-6 {}] hbase.PerformanceEvaluation: 
> Sampling 1 every 0 out of 1073741824 total rows.
> 2024-12-03T12:21:10,338 INFO  [TestClient-4 {}] hbase.PerformanceEvaluation: 
> Sampling 1 every 0 out of 1073741824 total rows.
> ...
> java.io.IOException: java.lang.ArithmeticException: / by zero
>         at 
> org.apache.hadoop.hbase.PerformanceEvaluation.doLocalClients(PerformanceEvaluation.java:540)
>         at 
> org.apache.hadoop.hbase.PerformanceEvaluation.runTest(PerformanceEvaluation.java:2674)
>         at 
> org.apache.hadoop.hbase.PerformanceEvaluation.run(PerformanceEvaluation.java:3216)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:97)
>         at 
> org.apache.hadoop.hbase.PerformanceEvaluation.main(PerformanceEvaluation.java:3250)
> {noformat}
> So I think it's best that we just use 8-byte long integers throughout the 
> code.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to