[jira] [Updated] (HBASE-25346) hbase2.x the speed of writing data is slower than version 1.x

nilonealex (Jira) Tue, 01 Dec 2020 01:59:36 -0800


     [ 
https://issues.apache.org/jira/browse/HBASE-25346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


nilonealex updated HBASE-25346:
-------------------------------
    Description: 
Recently we found that the newly built hbase cluster is running a bit slow 
,then wo begin to  data load and query performance verification based on 
Hbase2.0.2 ( HDP3.1.1) & Hbase1.2.0 ( CDH5.13.3 ), and found that put data 
based on hbase2.0 is much slower than hbase1.x (the former is almost half of 
the latter), I use BufferedMutator and BufferedMutatorParams syntax for batch 
[put to improve efficiency. Some of the codes are as follows:

-----------------------------------------------------------------------
List<Mutation> mutator = new ArrayList<>();
BufferedMutator table = null;

BufferedMutatorParams params = new 
BufferedMutatorParams(TableName.valueOf(fileHbRule.getHbaseTableName()));
params.writeBufferSize(fileHbRule.getFlushBuffer().intValue()*1024*1024);
table = connection.getBufferedMutator(params);          
                
mutator.add(p);

if(totalCnts % 5000 == 0 ) {
        table.mutate(mutator);
        mutator.clear();
}
-----------------------------------------------------------------------

The file to put is a text format file: 2 million rows comma-separated text 
file, each row records 110 columns, total size is about 1G. In addition to the 
main parameter configuration such as heap memory, I kept the default parameter 
values ??for most of the hbase services.
The load program is designed for single thread.

The following is the progress information :

----------------------- Hbase1.2.0 ( CDH5.13.3 ) 
--------------------------------------------
2020-12-01 16:48:18 inserted:  100000
2020-12-01 16:48:36 inserted:  200000
2020-12-01 16:48:52 inserted:  300000
2020-12-01 16:49:08 inserted:  400000
2020-12-01 16:49:23 inserted:  500000
2020-12-01 16:49:39 inserted:  600000
2020-12-01 16:49:56 inserted:  700000
2020-12-01 16:50:12 inserted:  800000
2020-12-01 16:50:29 inserted:  900000
2020-12-01 16:50:45 inserted:  1000000
2020-12-01 16:51:01 inserted:  1100000
2020-12-01 16:51:17 inserted:  1200000
2020-12-01 16:51:34 inserted:  1300000
2020-12-01 16:51:49 inserted:  1400000
2020-12-01 16:52:05 inserted:  1500000
2020-12-01 16:52:21 inserted:  1600000
2020-12-01 16:52:40 inserted:  1700000
2020-12-01 16:52:57 inserted:  1800000
2020-12-01 16:53:19 inserted:  1900000
2020-12-01 16:53:42 inserted:  2000000
2020-12-01 16:53:48 inserted:  2000000
imp finished ok! 
--job finished--

-----------------------Hbase.2.0.2 ( 
HDP3.1.1)--------------------------------------------------------------
2020-12-01 17:25:24 inserted:  100000
2020-12-01 17:26:03 inserted:  200000
2020-12-01 17:26:39 inserted:  300000
2020-12-01 17:27:13 inserted:  400000
2020-12-01 17:27:47 inserted:  500000
2020-12-01 17:28:23 inserted:  600000
2020-12-01 17:29:03 inserted:  700000
2020-12-01 17:29:40 inserted:  800000
2020-12-01 17:30:15 inserted:  900000
2020-12-01 17:30:51 inserted:  1000000
2020-12-01 17:31:27 inserted:  1100000
2020-12-01 17:32:03 inserted:  1200000
2020-12-01 17:32:39 inserted:  1300000
2020-12-01 17:33:14 inserted:  1400000
2020-12-01 17:33:50 inserted:  1500000
2020-12-01 17:34:25 inserted:  1600000
2020-12-01 17:35:01 inserted:  1700000
2020-12-01 17:35:38 inserted:  1800000
2020-12-01 17:36:14 inserted:  1900000
2020-12-01 17:36:51 inserted:  2000000
2020-12-01 17:36:55 inserted:  2000000
imp finished ok! 
--job finished--
returnCode=0

Are there any key points that I have not done configuration? or，, this version 
has performance defects ?

  was:
I am doing data load and query performance verification based on Hbase2.0.2 ( 
HDP3.1.1) & Hbase1.2.0 ( CDH5.13.3 ), and found that put data based on hbase2.0 
is much slower than hbase1.x (the former is almost half of the latter), I use 
BufferedMutator and BufferedMutatorParams syntax for batch [put to improve 
efficiency. Some of the codes are as follows:

-----------------------------------------------------------------------
List<Mutation> mutator = new ArrayList<>();
BufferedMutator table = null;

BufferedMutatorParams params = new 
BufferedMutatorParams(TableName.valueOf(fileHbRule.getHbaseTableName()));
params.writeBufferSize(fileHbRule.getFlushBuffer().intValue()*1024*1024);
table = connection.getBufferedMutator(params);          
                
mutator.add(p);

if(totalCnts % 5000 == 0 ) {
        table.mutate(mutator);
        mutator.clear();
}
-----------------------------------------------------------------------

The file to put is a text format file: 2 million rows comma-separated text 
file, each row records 110 columns, total size is about 1G. In addition to the 
main parameter configuration such as heap memory, I kept the default parameter 
values ??for most of the hbase services.
The load program is designed for single thread.

The following is the progress information :

----------------------- Hbase1.2.0 ( CDH5.13.3 ) 
--------------------------------------------
2020-12-01 16:48:18 inserted:  100000
2020-12-01 16:48:36 inserted:  200000
2020-12-01 16:48:52 inserted:  300000
2020-12-01 16:49:08 inserted:  400000
2020-12-01 16:49:23 inserted:  500000
2020-12-01 16:49:39 inserted:  600000
2020-12-01 16:49:56 inserted:  700000
2020-12-01 16:50:12 inserted:  800000
2020-12-01 16:50:29 inserted:  900000
2020-12-01 16:50:45 inserted:  1000000
2020-12-01 16:51:01 inserted:  1100000
2020-12-01 16:51:17 inserted:  1200000
2020-12-01 16:51:34 inserted:  1300000
2020-12-01 16:51:49 inserted:  1400000
2020-12-01 16:52:05 inserted:  1500000
2020-12-01 16:52:21 inserted:  1600000
2020-12-01 16:52:40 inserted:  1700000
2020-12-01 16:52:57 inserted:  1800000
2020-12-01 16:53:19 inserted:  1900000
2020-12-01 16:53:42 inserted:  2000000
2020-12-01 16:53:48 inserted:  2000000
imp finished ok! 
--job finished--

-----------------------Hbase.2.0.2 ( 
HDP3.1.1)--------------------------------------------------------------
2020-12-01 17:25:24 inserted:  100000
2020-12-01 17:26:03 inserted:  200000
2020-12-01 17:26:39 inserted:  300000
2020-12-01 17:27:13 inserted:  400000
2020-12-01 17:27:47 inserted:  500000
2020-12-01 17:28:23 inserted:  600000
2020-12-01 17:29:03 inserted:  700000
2020-12-01 17:29:40 inserted:  800000
2020-12-01 17:30:15 inserted:  900000
2020-12-01 17:30:51 inserted:  1000000
2020-12-01 17:31:27 inserted:  1100000
2020-12-01 17:32:03 inserted:  1200000
2020-12-01 17:32:39 inserted:  1300000
2020-12-01 17:33:14 inserted:  1400000
2020-12-01 17:33:50 inserted:  1500000
2020-12-01 17:34:25 inserted:  1600000
2020-12-01 17:35:01 inserted:  1700000
2020-12-01 17:35:38 inserted:  1800000
2020-12-01 17:36:14 inserted:  1900000
2020-12-01 17:36:51 inserted:  2000000
2020-12-01 17:36:55 inserted:  2000000
imp finished ok! 
--job finished--
returnCode=0

Are there any key points that I have not done configuration? or，, this version 
has performance defects ?


> hbase2.x the speed of writing data is slower than version 1.x
> -------------------------------------------------------------
>
>                 Key: HBASE-25346
>                 URL: https://issues.apache.org/jira/browse/HBASE-25346
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 2.0.2
>            Reporter: nilonealex
>            Priority: Critical
>         Attachments: hbase-site.xml
>
>
> Recently we found that the newly built hbase cluster is running a bit slow 
> ,then wo begin to  data load and query performance verification based on 
> Hbase2.0.2 ( HDP3.1.1) & Hbase1.2.0 ( CDH5.13.3 ), and found that put data 
> based on hbase2.0 is much slower than hbase1.x (the former is almost half of 
> the latter), I use BufferedMutator and BufferedMutatorParams syntax for batch 
> [put to improve efficiency. Some of the codes are as follows:
> -----------------------------------------------------------------------
> List<Mutation> mutator = new ArrayList<>();
> BufferedMutator table = null;
> BufferedMutatorParams params = new 
> BufferedMutatorParams(TableName.valueOf(fileHbRule.getHbaseTableName()));
> params.writeBufferSize(fileHbRule.getFlushBuffer().intValue()*1024*1024);
> table = connection.getBufferedMutator(params);                
>               
> mutator.add(p);
> if(totalCnts % 5000 == 0 ) {
>       table.mutate(mutator);
>       mutator.clear();
> }
> -----------------------------------------------------------------------
> The file to put is a text format file: 2 million rows comma-separated text 
> file, each row records 110 columns, total size is about 1G. In addition to 
> the main parameter configuration such as heap memory, I kept the default 
> parameter values ??for most of the hbase services.
> The load program is designed for single thread.
> The following is the progress information :
> ----------------------- Hbase1.2.0 ( CDH5.13.3 ) 
> --------------------------------------------
> 2020-12-01 16:48:18 inserted:  100000
> 2020-12-01 16:48:36 inserted:  200000
> 2020-12-01 16:48:52 inserted:  300000
> 2020-12-01 16:49:08 inserted:  400000
> 2020-12-01 16:49:23 inserted:  500000
> 2020-12-01 16:49:39 inserted:  600000
> 2020-12-01 16:49:56 inserted:  700000
> 2020-12-01 16:50:12 inserted:  800000
> 2020-12-01 16:50:29 inserted:  900000
> 2020-12-01 16:50:45 inserted:  1000000
> 2020-12-01 16:51:01 inserted:  1100000
> 2020-12-01 16:51:17 inserted:  1200000
> 2020-12-01 16:51:34 inserted:  1300000
> 2020-12-01 16:51:49 inserted:  1400000
> 2020-12-01 16:52:05 inserted:  1500000
> 2020-12-01 16:52:21 inserted:  1600000
> 2020-12-01 16:52:40 inserted:  1700000
> 2020-12-01 16:52:57 inserted:  1800000
> 2020-12-01 16:53:19 inserted:  1900000
> 2020-12-01 16:53:42 inserted:  2000000
> 2020-12-01 16:53:48 inserted:  2000000
> imp finished ok! 
> --job finished--
> -----------------------Hbase.2.0.2 ( 
> HDP3.1.1)--------------------------------------------------------------
> 2020-12-01 17:25:24 inserted:  100000
> 2020-12-01 17:26:03 inserted:  200000
> 2020-12-01 17:26:39 inserted:  300000
> 2020-12-01 17:27:13 inserted:  400000
> 2020-12-01 17:27:47 inserted:  500000
> 2020-12-01 17:28:23 inserted:  600000
> 2020-12-01 17:29:03 inserted:  700000
> 2020-12-01 17:29:40 inserted:  800000
> 2020-12-01 17:30:15 inserted:  900000
> 2020-12-01 17:30:51 inserted:  1000000
> 2020-12-01 17:31:27 inserted:  1100000
> 2020-12-01 17:32:03 inserted:  1200000
> 2020-12-01 17:32:39 inserted:  1300000
> 2020-12-01 17:33:14 inserted:  1400000
> 2020-12-01 17:33:50 inserted:  1500000
> 2020-12-01 17:34:25 inserted:  1600000
> 2020-12-01 17:35:01 inserted:  1700000
> 2020-12-01 17:35:38 inserted:  1800000
> 2020-12-01 17:36:14 inserted:  1900000
> 2020-12-01 17:36:51 inserted:  2000000
> 2020-12-01 17:36:55 inserted:  2000000
> imp finished ok! 
> --job finished--
> returnCode=0
> Are there any key points that I have not done configuration? or，, this 
> version has performance defects ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HBASE-25346) hbase2.x the speed of writing data is slower than version 1.x

Reply via email to