[ https://issues.apache.org/jira/browse/HBASE-29013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Duo Zhang resolved HBASE-29013. ------------------------------- Fix Version/s: 2.7.0 3.0.0-beta-2 2.5.11 2.6.2 Hadoop Flags: Reviewed Resolution: Fixed Pushed to all active branches. Thanks [~junegunn] for contributing! > Make PerformanceEvaluation support larger data sets > --------------------------------------------------- > > Key: HBASE-29013 > URL: https://issues.apache.org/jira/browse/HBASE-29013 > Project: HBase > Issue Type: Improvement > Components: PE > Reporter: Junegunn Choi > Assignee: Junegunn Choi > Priority: Minor > Labels: pull-request-available > Fix For: 2.7.0, 3.0.0-beta-2, 2.5.11, 2.6.2 > > > The use of 4-byte integers in PerformanceEvaluation can be limiting when you > want to test with larger data sets. Suppose you want to generate 10TB of data > with the default value size of 1KB, you would need 10G rows. > {code:java} > bin/hbase pe --nomapred --presplit=21 --compress=LZ4 --rows=10737418240 > randomWrite 1 > {code} > But you can't do it because {{--rows}} expect a number that can be > represented with 4 bytes. > {noformat} > java.lang.NumberFormatException: For input string: "10737418240" > {noformat} > We can instead increase the value size and decrease the number of the rows to > circumvent the limitation, but I don't see a good reason to have the > limitation in the first place. > And even if we use a smaller value for {{{}--row{}}}, we can accidentally > cause integer overflow as we increase the number of clients. > {code:java} > bin/hbase pe --nomapred --compress=LZ4 --rows=1073741824 randomWrite 20 > {code} > {noformat} > 2024-12-03T12:21:10,333 INFO [main {}] hbase.PerformanceEvaluation: Created > 20 connections for 20 threads > 2024-12-03T12:21:10,337 INFO [TestClient-5 {}] hbase.PerformanceEvaluation: > Start class org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest at > offset 1073741824 for 1073741824 rows > 2024-12-03T12:21:10,337 INFO [TestClient-1 {}] hbase.PerformanceEvaluation: > Start class org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest at > offset 1073741824 for 1073741824 rows > 2024-12-03T12:21:10,337 INFO [TestClient-3 {}] hbase.PerformanceEvaluation: > Start class org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest at > offset -1073741824 for 1073741824 rows > 2024-12-03T12:21:10,337 INFO [TestClient-4 {}] hbase.PerformanceEvaluation: > Start class org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest at > offset 0 for 1073741824 rows > 2024-12-03T12:21:10,337 INFO [TestClient-7 {}] hbase.PerformanceEvaluation: > Start class org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest at > offset -1073741824 for 1073741824 rows > 2024-12-03T12:21:10,337 INFO [TestClient-8 {}] hbase.PerformanceEvaluation: > Start class org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest at > offset 0 for 1073741824 rows > ... > 2024-12-03T12:21:10,338 INFO [TestClient-17 {}] hbase.PerformanceEvaluation: > Sampling 1 every 0 out of 1073741824 total rows. > 2024-12-03T12:21:10,338 INFO [TestClient-16 {}] hbase.PerformanceEvaluation: > Sampling 1 every 0 out of 1073741824 total rows. > 2024-12-03T12:21:10,338 INFO [TestClient-6 {}] hbase.PerformanceEvaluation: > Sampling 1 every 0 out of 1073741824 total rows. > 2024-12-03T12:21:10,338 INFO [TestClient-4 {}] hbase.PerformanceEvaluation: > Sampling 1 every 0 out of 1073741824 total rows. > ... > java.io.IOException: java.lang.ArithmeticException: / by zero > at > org.apache.hadoop.hbase.PerformanceEvaluation.doLocalClients(PerformanceEvaluation.java:540) > at > org.apache.hadoop.hbase.PerformanceEvaluation.runTest(PerformanceEvaluation.java:2674) > at > org.apache.hadoop.hbase.PerformanceEvaluation.run(PerformanceEvaluation.java:3216) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:97) > at > org.apache.hadoop.hbase.PerformanceEvaluation.main(PerformanceEvaluation.java:3250) > {noformat} > So I think it's best that we just use 8-byte long integers throughout the > code. > -- This message was sent by Atlassian Jira (v8.20.10#820010)