[jira] [Comment Edited] (PHOENIX-6677) Parallelism within a batch of mutations

Istvan Toth (Jira) Thu, 09 Jun 2022 21:17:07 -0700


    [ 
https://issues.apache.org/jira/browse/PHOENIX-6677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552531#comment-17552531
 ]


Istvan Toth edited comment on PHOENIX-6677 at 6/10/22 4:16 AM:
---------------------------------------------------------------

Thank you for the detailed explanation, [~kozdemir] .

I've indeed made the mistake of trusting the comments, and not digging deep 
enough into the code.
Checking the call tree of getMutateBatchSize() confirms that you are right.

Does having two sets of properties make any sense (i.e. is there a case when 
the non-batch properties are used for anything ?)


was (Author: stoty):
Thank you for the detailed explanation, [~kozdemir] .

I've indeed made the mistake of trusing the comments, and not digging deep 
enough into the code.
Checking the call tree of getMutateBatchSize() confirms that you are right.

Does having two sets of properties make any sense (i.e. is there a case when 
the non-batch properties are used for anything ?)

> Parallelism within a batch of mutations 
> ----------------------------------------
>
>                 Key: PHOENIX-6677
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6677
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Kadir OZDEMIR
>            Priority: Major
>
> Currently, Phoenix client simply passes the batches of row mutations from the 
> application to HBase client without any parallelism or intelligent grouping 
> (except grouping mutations for the same row). 
> Assume that the application creates batches 10000 row mutations for a given 
> table. Phoenix client divides these rows based on their arrival order into 
> HBase batches of n (e.g., 100) rows based on the configured batch size, i.e., 
> the number of rows and bytes. Then, Phoenix calls HBase batch API, one batch 
> at a time (i.e., serially).  HBase client further divides a given batch of 
> rows into smaller batches based on their regions. This means that a large 
> batch created by the application is divided into many tiny batches and 
> executed mostly serially. For slated tables, this will result in even smaller 
> batches. 
> We can improve the current implementation greatly if we group the rows of the 
> batch prepared by the application into sub batches based on table region 
> boundaries and then execute these batches in parallel. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Comment Edited] (PHOENIX-6677) Parallelism within a batch of mutations

Reply via email to