[ https://issues.apache.org/jira/browse/PHOENIX-6698 ]


    jichen deleted comment on PHOENIX-6698:
    ---------------------------------

was (Author: jichen0919):
PHOENIX-6698 hive-connector will take long time to generate splits for large 
phoenix tables.
This patch enables PhoenixInputFormat to generate splits in parallel, it 
introduce two parameters to control the degree of parallelism. 
1.'hive.phoenix.split.parallel.threshold' is used to contrl if split should be 
generated in parallel.it will generate splits in serial for following condition:
(1) hive.phoenix.split.parallel.threshold<0, it will generate split in serial. 
(2) number of scans in query plan is less than the value setting.
in other conditions, it will generate split in parallel.
2. hive.phoenix.split.parallel.level
is used to control the number of work threads for the splits.(2*cpu cores by 
default).

 

Unit Test shows generate splits in parallel yields 2x plus performance then the 
serial counterpart.

> hive-connector will take long time to generate splits for large phoenix 
> tables.
> -------------------------------------------------------------------------------
>
>                 Key: PHOENIX-6698
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6698
>             Project: Phoenix
>          Issue Type: Improvement
>          Components: hive-connector
>    Affects Versions: 5.1.0
>            Reporter: jichen
>            Assignee: jichen
>            Priority: Minor
>             Fix For: connectors-6.0.0
>
>         Attachments: PHOENIX-6698.master.v1.patch
>
>
> {{{color:#1d1c1d}In our production environment, hive-phoenix connector  will 
> take nearly 30-40 minutes to generate splits for large phoenix table, which 
> has more than 2048 regions.it is because in class PhoenixInputFormat, 
> function  'generateSplits' only uses one thread to generate splits for each 
> scan. My proposal is to use multi-thread to generate splits in parallel. the 
> proposal has been validated in our production environment.by  changing code 
> {color}}}{color:#1d1c1d}to generate splits  in parallel with 24 threads, the 
> time cost is reduced to 2 minutes.  {color}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to