[jira] [Created] (PHOENIX-6677) Parallelism within a batch of mutations

Kadir OZDEMIR (Jira) Mon, 28 Mar 2022 16:51:07 -0700

Kadir OZDEMIR created PHOENIX-6677:
--------------------------------------

             Summary: Parallelism within a batch of mutations 
                 Key: PHOENIX-6677
                 URL: https://issues.apache.org/jira/browse/PHOENIX-6677
             Project: Phoenix
          Issue Type: Improvement
            Reporter: Kadir OZDEMIR
             Fix For: 4.17.0, 5.2.0



Currently, Phoenix client simply passes the batches of row mutations from the 
application to HBase client without any parallelism or intelligent grouping 
(except grouping mutations for the same row). 

Assume that the application creates batches 10000 row mutations for a given 
table. Phoenix client divides these rows based on their arrival order into 
HBase batches of n (e.g., 100) rows based on the configured batch size, i.e., 
the number of rows and bytes. Then, Phoenix calls HBase batch API, one batch at 
a time (i.e., serially).  HBase client further divides a given batch of rows 
into smaller batches based on their regions. This means that a large batch 
created by the application is divided into many tiny batches and executed 
mostly serially. For slated tables, this will result in even smaller batches. 

We can improve the current implementation greatly if we group the rows of the 
batch prepared by the application into sub batches based on table region 
boundaries and then execute these batches in parallel. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (PHOENIX-6677) Parallelism within a batch of mutations

Reply via email to