[
https://issues.apache.org/jira/browse/HBASE-29013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17908985#comment-17908985
]
Hudson commented on HBASE-29013:
--------------------------------
Results for branch master
[build #1233 on
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/1233/]:
(/) *{color:green}+1 overall{color}*
----
details (if available):
(/) {color:green}+1 general checks{color}
-- For more information [see general
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/1233/General_20Nightly_20Build_20Report/]
(/) {color:green}+1 jdk17 hadoop3 checks{color}
-- For more information [see jdk17
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/1233/JDK17_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(/) {color:green}+1 jdk17 hadoop ${HADOOP_THREE_VERSION} backward compatibility
checks{color}
-- For more information [see jdk17
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/1233/JDK17_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(/) {color:green}+1 jdk17 hadoop ${HADOOP_THREE_VERSION} backward compatibility
checks{color}
-- For more information [see jdk17
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/1233/JDK17_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(/) {color:green}+1 jdk17 hadoop ${HADOOP_THREE_VERSION} backward compatibility
checks{color}
-- For more information [see jdk17
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/1233/JDK17_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(/) {color:green}+1 source release artifact{color}
-- See build output for details.
(/) {color:green}+1 client integration test for 3.3.5 {color}
(/) {color:green}+1 client integration test for 3.3.6 {color}
(/) {color:green}+1 client integration test for 3.4.0 {color}
(/) {color:green}+1 client integration test for 3.4.1 {color}
> Make PerformanceEvaluation support larger data sets
> ---------------------------------------------------
>
> Key: HBASE-29013
> URL: https://issues.apache.org/jira/browse/HBASE-29013
> Project: HBase
> Issue Type: Improvement
> Components: PE
> Reporter: Junegunn Choi
> Assignee: Junegunn Choi
> Priority: Minor
> Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2, 2.5.11, 2.6.2
>
>
> The use of 4-byte integers in PerformanceEvaluation can be limiting when you
> want to test with larger data sets. Suppose you want to generate 10TB of data
> with the default value size of 1KB, you would need 10G rows.
> {code:java}
> bin/hbase pe --nomapred --presplit=21 --compress=LZ4 --rows=10737418240
> randomWrite 1
> {code}
> But you can't do it because {{--rows}} expect a number that can be
> represented with 4 bytes.
> {noformat}
> java.lang.NumberFormatException: For input string: "10737418240"
> {noformat}
> We can instead increase the value size and decrease the number of the rows to
> circumvent the limitation, but I don't see a good reason to have the
> limitation in the first place.
> And even if we use a smaller value for {{{}--row{}}}, we can accidentally
> cause integer overflow as we increase the number of clients.
> {code:java}
> bin/hbase pe --nomapred --compress=LZ4 --rows=1073741824 randomWrite 20
> {code}
> {noformat}
> 2024-12-03T12:21:10,333 INFO [main {}] hbase.PerformanceEvaluation: Created
> 20 connections for 20 threads
> 2024-12-03T12:21:10,337 INFO [TestClient-5 {}] hbase.PerformanceEvaluation:
> Start class org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest at
> offset 1073741824 for 1073741824 rows
> 2024-12-03T12:21:10,337 INFO [TestClient-1 {}] hbase.PerformanceEvaluation:
> Start class org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest at
> offset 1073741824 for 1073741824 rows
> 2024-12-03T12:21:10,337 INFO [TestClient-3 {}] hbase.PerformanceEvaluation:
> Start class org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest at
> offset -1073741824 for 1073741824 rows
> 2024-12-03T12:21:10,337 INFO [TestClient-4 {}] hbase.PerformanceEvaluation:
> Start class org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest at
> offset 0 for 1073741824 rows
> 2024-12-03T12:21:10,337 INFO [TestClient-7 {}] hbase.PerformanceEvaluation:
> Start class org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest at
> offset -1073741824 for 1073741824 rows
> 2024-12-03T12:21:10,337 INFO [TestClient-8 {}] hbase.PerformanceEvaluation:
> Start class org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest at
> offset 0 for 1073741824 rows
> ...
> 2024-12-03T12:21:10,338 INFO [TestClient-17 {}] hbase.PerformanceEvaluation:
> Sampling 1 every 0 out of 1073741824 total rows.
> 2024-12-03T12:21:10,338 INFO [TestClient-16 {}] hbase.PerformanceEvaluation:
> Sampling 1 every 0 out of 1073741824 total rows.
> 2024-12-03T12:21:10,338 INFO [TestClient-6 {}] hbase.PerformanceEvaluation:
> Sampling 1 every 0 out of 1073741824 total rows.
> 2024-12-03T12:21:10,338 INFO [TestClient-4 {}] hbase.PerformanceEvaluation:
> Sampling 1 every 0 out of 1073741824 total rows.
> ...
> java.io.IOException: java.lang.ArithmeticException: / by zero
> at
> org.apache.hadoop.hbase.PerformanceEvaluation.doLocalClients(PerformanceEvaluation.java:540)
> at
> org.apache.hadoop.hbase.PerformanceEvaluation.runTest(PerformanceEvaluation.java:2674)
> at
> org.apache.hadoop.hbase.PerformanceEvaluation.run(PerformanceEvaluation.java:3216)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:97)
> at
> org.apache.hadoop.hbase.PerformanceEvaluation.main(PerformanceEvaluation.java:3250)
> {noformat}
> So I think it's best that we just use 8-byte long integers throughout the
> code.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)