[
https://issues.apache.org/jira/browse/HBASE-25346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
nilonealex updated HBASE-25346:
-------------------------------
Description:
Recently we found that the newly built production hbase cluster is running a
bit slow , the hadoop version is Hbase2.0.2 ( HDP3.1.1) and it has 100
nodes.Then we begin to do load & query performance verification between
Hbase2.0.2 ( HDP3.1.1) & Hbase1.2.0 ( CDH5.13.3 ) test environment (4nodes),
found that : put data based on hbase2.0 is much slower than hbase1.x (the
former is almost half of the latter), I use BufferedMutator and
BufferedMutatorParams term for batch put to improve efficiency. More confusing
is the performance of the production environment is worse than my test
environment
Some of the codes are as follows:
-----------------------------------------------------------------------
{color:#4C9AFF}List<Mutation> mutator = new ArrayList<>();
BufferedMutator table = null;
BufferedMutatorParams params = new
BufferedMutatorParams(TableName.valueOf(fileHbRule.getHbaseTableName()));
params.writeBufferSize(fileHbRule.getFlushBuffer().intValue()*1024*1024);
table = connection.getBufferedMutator(params);
mutator.add(p);
if(totalCnts % 5000 == 0 ) {
table.mutate(mutator);
mutator.clear();
}{color}
-----------------------------------------------------------------------
The file to put is a text format file: 2 million rows comma-separated text
file, each row records 110 columns, total size is about 1G. In addition to the
main parameter configuration such as heap memory, I kept the default parameter
values ??for most of the hbase services.
The load program is designed for single thread.
The following is the progress information :
----------------------- Hbase1.2.0 ( CDH5.13.3 )
--------------------------------------------
2020-12-01 16:48:18 inserted: 100000
2020-12-01 16:48:36 inserted: 200000
2020-12-01 16:48:52 inserted: 300000
2020-12-01 16:49:08 inserted: 400000
2020-12-01 16:49:23 inserted: 500000
2020-12-01 16:49:39 inserted: 600000
2020-12-01 16:49:56 inserted: 700000
2020-12-01 16:50:12 inserted: 800000
2020-12-01 16:50:29 inserted: 900000
2020-12-01 16:50:45 inserted: 1000000
2020-12-01 16:51:01 inserted: 1100000
2020-12-01 16:51:17 inserted: 1200000
2020-12-01 16:51:34 inserted: 1300000
2020-12-01 16:51:49 inserted: 1400000
2020-12-01 16:52:05 inserted: 1500000
2020-12-01 16:52:21 inserted: 1600000
2020-12-01 16:52:40 inserted: 1700000
2020-12-01 16:52:57 inserted: 1800000
2020-12-01 16:53:19 inserted: 1900000
2020-12-01 16:53:42 inserted: 2000000
2020-12-01 16:53:48 inserted: 2000000
imp finished ok!
--job finished--
-----------------------Hbase.2.0.2 (
HDP3.1.1)---------------------------------------------
2020-12-01 17:25:24 inserted: 100000
2020-12-01 17:26:03 inserted: 200000
2020-12-01 17:26:39 inserted: 300000
2020-12-01 17:27:13 inserted: 400000
2020-12-01 17:27:47 inserted: 500000
2020-12-01 17:28:23 inserted: 600000
2020-12-01 17:29:03 inserted: 700000
2020-12-01 17:29:40 inserted: 800000
2020-12-01 17:30:15 inserted: 900000
2020-12-01 17:30:51 inserted: 1000000
2020-12-01 17:31:27 inserted: 1100000
2020-12-01 17:32:03 inserted: 1200000
2020-12-01 17:32:39 inserted: 1300000
2020-12-01 17:33:14 inserted: 1400000
2020-12-01 17:33:50 inserted: 1500000
2020-12-01 17:34:25 inserted: 1600000
2020-12-01 17:35:01 inserted: 1700000
2020-12-01 17:35:38 inserted: 1800000
2020-12-01 17:36:14 inserted: 1900000
2020-12-01 17:36:51 inserted: 2000000
2020-12-01 17:36:55 inserted: 2000000
imp finished ok!
--job finished--
returnCode=0
In addition, we also did some benchmark tests on the production cluster.The
delay is seem to be a bit high. The detailed report is in the attachment.
Are there any key points that I have not done configuration? or,, this version
has performance defects ?
was:
Recently we found that the newly built production hbase cluster is running a
bit slow , the hadoop version is Hbase2.0.2 ( HDP3.1.1) and it has 100
nodes.Then we begin to do load & query performance verification between
Hbase2.0.2 ( HDP3.1.1) & Hbase1.2.0 ( CDH5.13.3 ) test environment (4nodes),
found that : put data based on hbase2.0 is much slower than hbase1.x (the
former is almost half of the latter), I use BufferedMutator and
BufferedMutatorParams term for batch put to improve efficiency. Some of the
codes are as follows:
-----------------------------------------------------------------------
{color:#4C9AFF}List<Mutation> mutator = new ArrayList<>();
BufferedMutator table = null;
BufferedMutatorParams params = new
BufferedMutatorParams(TableName.valueOf(fileHbRule.getHbaseTableName()));
params.writeBufferSize(fileHbRule.getFlushBuffer().intValue()*1024*1024);
table = connection.getBufferedMutator(params);
mutator.add(p);
if(totalCnts % 5000 == 0 ) {
table.mutate(mutator);
mutator.clear();
}{color}
-----------------------------------------------------------------------
The file to put is a text format file: 2 million rows comma-separated text
file, each row records 110 columns, total size is about 1G. In addition to the
main parameter configuration such as heap memory, I kept the default parameter
values ??for most of the hbase services.
The load program is designed for single thread.
The following is the progress information :
----------------------- Hbase1.2.0 ( CDH5.13.3 )
--------------------------------------------
2020-12-01 16:48:18 inserted: 100000
2020-12-01 16:48:36 inserted: 200000
2020-12-01 16:48:52 inserted: 300000
2020-12-01 16:49:08 inserted: 400000
2020-12-01 16:49:23 inserted: 500000
2020-12-01 16:49:39 inserted: 600000
2020-12-01 16:49:56 inserted: 700000
2020-12-01 16:50:12 inserted: 800000
2020-12-01 16:50:29 inserted: 900000
2020-12-01 16:50:45 inserted: 1000000
2020-12-01 16:51:01 inserted: 1100000
2020-12-01 16:51:17 inserted: 1200000
2020-12-01 16:51:34 inserted: 1300000
2020-12-01 16:51:49 inserted: 1400000
2020-12-01 16:52:05 inserted: 1500000
2020-12-01 16:52:21 inserted: 1600000
2020-12-01 16:52:40 inserted: 1700000
2020-12-01 16:52:57 inserted: 1800000
2020-12-01 16:53:19 inserted: 1900000
2020-12-01 16:53:42 inserted: 2000000
2020-12-01 16:53:48 inserted: 2000000
imp finished ok!
--job finished--
-----------------------Hbase.2.0.2 (
HDP3.1.1)---------------------------------------------
2020-12-01 17:25:24 inserted: 100000
2020-12-01 17:26:03 inserted: 200000
2020-12-01 17:26:39 inserted: 300000
2020-12-01 17:27:13 inserted: 400000
2020-12-01 17:27:47 inserted: 500000
2020-12-01 17:28:23 inserted: 600000
2020-12-01 17:29:03 inserted: 700000
2020-12-01 17:29:40 inserted: 800000
2020-12-01 17:30:15 inserted: 900000
2020-12-01 17:30:51 inserted: 1000000
2020-12-01 17:31:27 inserted: 1100000
2020-12-01 17:32:03 inserted: 1200000
2020-12-01 17:32:39 inserted: 1300000
2020-12-01 17:33:14 inserted: 1400000
2020-12-01 17:33:50 inserted: 1500000
2020-12-01 17:34:25 inserted: 1600000
2020-12-01 17:35:01 inserted: 1700000
2020-12-01 17:35:38 inserted: 1800000
2020-12-01 17:36:14 inserted: 1900000
2020-12-01 17:36:51 inserted: 2000000
2020-12-01 17:36:55 inserted: 2000000
imp finished ok!
--job finished--
returnCode=0
In addition, we also did some benchmark tests on the production cluster.The
delay is seem to be a bit high. The detailed report is in the attachment.
Are there any key points that I have not done configuration? or,, this version
has performance defects ?
> hbase2.x the performance is lower than hbase 1.x ?
> ---------------------------------------------------
>
> Key: HBASE-25346
> URL: https://issues.apache.org/jira/browse/HBASE-25346
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 2.0.2
> Reporter: nilonealex
> Priority: Critical
> Attachments: hbase-site.xml
>
>
> Recently we found that the newly built production hbase cluster is running a
> bit slow , the hadoop version is Hbase2.0.2 ( HDP3.1.1) and it has 100
> nodes.Then we begin to do load & query performance verification between
> Hbase2.0.2 ( HDP3.1.1) & Hbase1.2.0 ( CDH5.13.3 ) test environment (4nodes),
> found that : put data based on hbase2.0 is much slower than hbase1.x (the
> former is almost half of the latter), I use BufferedMutator and
> BufferedMutatorParams term for batch put to improve efficiency. More
> confusing is the performance of the production environment is worse than my
> test environment
> Some of the codes are as follows:
> -----------------------------------------------------------------------
> {color:#4C9AFF}List<Mutation> mutator = new ArrayList<>();
> BufferedMutator table = null;
> BufferedMutatorParams params = new
> BufferedMutatorParams(TableName.valueOf(fileHbRule.getHbaseTableName()));
> params.writeBufferSize(fileHbRule.getFlushBuffer().intValue()*1024*1024);
> table = connection.getBufferedMutator(params);
>
> mutator.add(p);
> if(totalCnts % 5000 == 0 ) {
> table.mutate(mutator);
> mutator.clear();
> }{color}
> -----------------------------------------------------------------------
> The file to put is a text format file: 2 million rows comma-separated text
> file, each row records 110 columns, total size is about 1G. In addition to
> the main parameter configuration such as heap memory, I kept the default
> parameter values ??for most of the hbase services.
> The load program is designed for single thread.
> The following is the progress information :
> ----------------------- Hbase1.2.0 ( CDH5.13.3 )
> --------------------------------------------
> 2020-12-01 16:48:18 inserted: 100000
> 2020-12-01 16:48:36 inserted: 200000
> 2020-12-01 16:48:52 inserted: 300000
> 2020-12-01 16:49:08 inserted: 400000
> 2020-12-01 16:49:23 inserted: 500000
> 2020-12-01 16:49:39 inserted: 600000
> 2020-12-01 16:49:56 inserted: 700000
> 2020-12-01 16:50:12 inserted: 800000
> 2020-12-01 16:50:29 inserted: 900000
> 2020-12-01 16:50:45 inserted: 1000000
> 2020-12-01 16:51:01 inserted: 1100000
> 2020-12-01 16:51:17 inserted: 1200000
> 2020-12-01 16:51:34 inserted: 1300000
> 2020-12-01 16:51:49 inserted: 1400000
> 2020-12-01 16:52:05 inserted: 1500000
> 2020-12-01 16:52:21 inserted: 1600000
> 2020-12-01 16:52:40 inserted: 1700000
> 2020-12-01 16:52:57 inserted: 1800000
> 2020-12-01 16:53:19 inserted: 1900000
> 2020-12-01 16:53:42 inserted: 2000000
> 2020-12-01 16:53:48 inserted: 2000000
> imp finished ok!
> --job finished--
> -----------------------Hbase.2.0.2 (
> HDP3.1.1)---------------------------------------------
> 2020-12-01 17:25:24 inserted: 100000
> 2020-12-01 17:26:03 inserted: 200000
> 2020-12-01 17:26:39 inserted: 300000
> 2020-12-01 17:27:13 inserted: 400000
> 2020-12-01 17:27:47 inserted: 500000
> 2020-12-01 17:28:23 inserted: 600000
> 2020-12-01 17:29:03 inserted: 700000
> 2020-12-01 17:29:40 inserted: 800000
> 2020-12-01 17:30:15 inserted: 900000
> 2020-12-01 17:30:51 inserted: 1000000
> 2020-12-01 17:31:27 inserted: 1100000
> 2020-12-01 17:32:03 inserted: 1200000
> 2020-12-01 17:32:39 inserted: 1300000
> 2020-12-01 17:33:14 inserted: 1400000
> 2020-12-01 17:33:50 inserted: 1500000
> 2020-12-01 17:34:25 inserted: 1600000
> 2020-12-01 17:35:01 inserted: 1700000
> 2020-12-01 17:35:38 inserted: 1800000
> 2020-12-01 17:36:14 inserted: 1900000
> 2020-12-01 17:36:51 inserted: 2000000
> 2020-12-01 17:36:55 inserted: 2000000
> imp finished ok!
> --job finished--
> returnCode=0
> In addition, we also did some benchmark tests on the production cluster.The
> delay is seem to be a bit high. The detailed report is in the attachment.
> Are there any key points that I have not done configuration? or,, this
> version has performance defects ?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)