[ 
https://issues.apache.org/jira/browse/HBASE-16224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426407#comment-15426407
 ] 

ChiaPing Tsai commented on HBASE-16224:
---------------------------------------

hi [[email protected]]

Q1) In the new spreadsheet, hbase-master(3, 2000, 2000, 6, seq), what's the 
meaning of 3 threads ?
A1)  Three threads submit data concurrently. (same Connection instance and 
diffenent table instances)

Q2) Do you know why hbase-16224(3, 2000, 2000, 6, seq) exhibited much better 
speed up compared to the case of hbase-16224(3, 2000, 1000, 6, seq) ?
A2) BufferedMutatorImpl will grab few mutations if the mutations have too many 
KVs. If the running tasks are too many, the few mutations will cause the 
busy-waiting and small request. Because the AP always iterates the same row 
collection.
This patch lets the AP access the inner buffer of BufferedMutatorImpl. It 
produces two benefits. 1) The AP can take the different rows on the next 
iteration if current rows are located on the busy regions or regionservers 2) 
The AP can generate large requests because it can iterate all rows instead of 
partial rows.

In summary, the BufferedMutatorImpl has no idea about grabbing the "good" rows 
for AP. If there are too many rows need to process, it is probable that the 
BufferedMutatorImpl grabs the "wrong" rows for AP.

> Reduce the number of RPCs for the large PUTs
> --------------------------------------------
>
>                 Key: HBASE-16224
>                 URL: https://issues.apache.org/jira/browse/HBASE-16224
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: ChiaPing Tsai
>            Assignee: ChiaPing Tsai
>            Priority: Minor
>         Attachments: HBASE-16224-v1.patch, HBASE-16224-v2.patch, 
> HBASE-16224-v3.patch, HBASE-16224-v4.patch, HBASE-16224-v5.patch, 
> HBASE-16224-v6.patch, HBASE-16224-v7.patch, HBASE-16224-v8.patch, 
> HBASE-16224-v9.patch, experiment-v9.patch.xlsx, experiment.xlsx
>
>
> This patch is proposed to reduce the number of RPC for the large PUTs 
> The number and data size of write thread(SingleServerRequestRunnable) is a 
> result of three main factors:
> 1) The flush size taken by BufferedMutatorImpl#backgroundFlushCommits
> 2) The limit of task number
> 3) ClientBackoffPolicy
> A lot of requests created with less MUTATIONs is a result of two reason: 
> 1) many regions of target table are in different server.
> 2) flush size in step one is summed by “all” server rather than “individual” 
> server
> This patch removes the limit of flush size in step one and add maximum size 
> to submit for each server in the AsyncProcess



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to