[jira] [Commented] (HBASE-29013) Make PerformanceEvaluation support larger data sets

Hudson (Jira) Tue, 24 Dec 2024 12:23:18 -0800


    [ 
https://issues.apache.org/jira/browse/HBASE-29013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17908123#comment-17908123
 ]


Hudson commented on HBASE-29013:
--------------------------------

Results for branch master
        [build #1231 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/1231/]: 
(x) *{color:red}-1 overall{color}*
----
details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/1231/General_20Nightly_20Build_20Report/]








(x) {color:red}-1 jdk17 hadoop3 checks{color}
-- For more information [see jdk17 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/1231/JDK17_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(x) {color:red}-1 jdk17 hadoop ${HADOOP_THREE_VERSION} backward compatibility 
checks{color}
-- For more information [see jdk17 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/1231/JDK17_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(x) {color:red}-1 jdk17 hadoop ${HADOOP_THREE_VERSION} backward compatibility 
checks{color}
-- For more information [see jdk17 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/1231/JDK17_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(x) {color:red}-1 jdk17 hadoop ${HADOOP_THREE_VERSION} backward compatibility 
checks{color}
-- For more information [see jdk17 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/1231/JDK17_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test for 3.3.5 {color}
(/) {color:green}+1 client integration test for 3.3.6 {color}
(/) {color:green}+1 client integration test for 3.4.0 {color}
(/) {color:green}+1 client integration test for 3.4.1 {color}


> Make PerformanceEvaluation support larger data sets
> ---------------------------------------------------
>
>                 Key: HBASE-29013
>                 URL: https://issues.apache.org/jira/browse/HBASE-29013
>             Project: HBase
>          Issue Type: Improvement
>          Components: PE
>            Reporter: Junegunn Choi
>            Assignee: Junegunn Choi
>            Priority: Minor
>              Labels: pull-request-available
>
> The use of 4-byte integers in PerformanceEvaluation can be limiting when you 
> want to test with larger data sets. Suppose you want to generate 10TB of data 
> with the default value size of 1KB, you would need 10G rows.
> {code:java}
> bin/hbase pe --nomapred --presplit=21 --compress=LZ4 --rows=10737418240 
> randomWrite 1
> {code}
> But you can't do it because {{--rows}} expect a number that can be 
> represented with 4 bytes.
> {noformat}
> java.lang.NumberFormatException: For input string: "10737418240"
> {noformat}
> We can instead increase the value size and decrease the number of the rows to 
> circumvent the limitation, but I don't see a good reason to have the 
> limitation in the first place.
> And even if we use a smaller value for {{{}--row{}}}, we can accidentally 
> cause integer overflow as we increase the number of clients.
> {code:java}
> bin/hbase pe --nomapred --compress=LZ4 --rows=1073741824 randomWrite 20
> {code}
> {noformat}
> 2024-12-03T12:21:10,333 INFO  [main {}] hbase.PerformanceEvaluation: Created 
> 20 connections for 20 threads
> 2024-12-03T12:21:10,337 INFO  [TestClient-5 {}] hbase.PerformanceEvaluation: 
> Start class org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest at 
> offset 1073741824 for 1073741824 rows
> 2024-12-03T12:21:10,337 INFO  [TestClient-1 {}] hbase.PerformanceEvaluation: 
> Start class org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest at 
> offset 1073741824 for 1073741824 rows
> 2024-12-03T12:21:10,337 INFO  [TestClient-3 {}] hbase.PerformanceEvaluation: 
> Start class org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest at 
> offset -1073741824 for 1073741824 rows
> 2024-12-03T12:21:10,337 INFO  [TestClient-4 {}] hbase.PerformanceEvaluation: 
> Start class org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest at 
> offset 0 for 1073741824 rows
> 2024-12-03T12:21:10,337 INFO  [TestClient-7 {}] hbase.PerformanceEvaluation: 
> Start class org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest at 
> offset -1073741824 for 1073741824 rows
> 2024-12-03T12:21:10,337 INFO  [TestClient-8 {}] hbase.PerformanceEvaluation: 
> Start class org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest at 
> offset 0 for 1073741824 rows
> ...
> 2024-12-03T12:21:10,338 INFO  [TestClient-17 {}] hbase.PerformanceEvaluation: 
> Sampling 1 every 0 out of 1073741824 total rows.
> 2024-12-03T12:21:10,338 INFO  [TestClient-16 {}] hbase.PerformanceEvaluation: 
> Sampling 1 every 0 out of 1073741824 total rows.
> 2024-12-03T12:21:10,338 INFO  [TestClient-6 {}] hbase.PerformanceEvaluation: 
> Sampling 1 every 0 out of 1073741824 total rows.
> 2024-12-03T12:21:10,338 INFO  [TestClient-4 {}] hbase.PerformanceEvaluation: 
> Sampling 1 every 0 out of 1073741824 total rows.
> ...
> java.io.IOException: java.lang.ArithmeticException: / by zero
>         at 
> org.apache.hadoop.hbase.PerformanceEvaluation.doLocalClients(PerformanceEvaluation.java:540)
>         at 
> org.apache.hadoop.hbase.PerformanceEvaluation.runTest(PerformanceEvaluation.java:2674)
>         at 
> org.apache.hadoop.hbase.PerformanceEvaluation.run(PerformanceEvaluation.java:3216)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:97)
>         at 
> org.apache.hadoop.hbase.PerformanceEvaluation.main(PerformanceEvaluation.java:3250)
> {noformat}
> So I think it's best that we just use 8-byte long integers throughout the 
> code.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HBASE-29013) Make PerformanceEvaluation support larger data sets

Reply via email to