[ 
https://issues.apache.org/jira/browse/PHOENIX-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jichen updated PHOENIX-6698:
----------------------------
    Description: {{{color:#1d1c1d}In our production environment, hive-phoenix 
connector  will take nearly 30-40 minutes to generate splits for large phoenix 
table, the table has more than 2048 regions.the performance will be worse if 
splitByStats is enabled,   it is because in class PhoenixInputFormat, function  
'generateSplits' only uses one thread to generate splits for each scan. My 
proposal is to use multi-thread to generate splits in parallel. the proposal 
has been validated in our production environment.by  changing code 
{color}}}{color:#1d1c1d}to generate splits  in parallel with 24 threads, the 
time cost is reduced to 2 minutes.  {color}  (was: {{{color:#1d1c1d}In our 
production environment, hive-phoenix connector  will take nearly 30-40 minutes 
to generate splits for large phoenix table, the table has more than 2048 
regions.the performance will be worse if splitByStats is enabled,   it is 
because in class PhoenixInputFormat, function  'generateSplits' only uses one 
thread to generate splits for each scan. My proposal is to use multi-thread to 
generate splits in parallel. the proposal has been validated in our production 
environment.by  changing code {color}}}{color:#1d1c1d}{color:#1d1c1d}to 
generate splits  in parallel with 24 threads, the time cost is reduced to 3-4 
minutes.  
{color}{color})

> hive-connector will take long time to generate splits for large phoenix 
> tables.
> -------------------------------------------------------------------------------
>
>                 Key: PHOENIX-6698
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6698
>             Project: Phoenix
>          Issue Type: Improvement
>          Components: hive-connector
>    Affects Versions: 5.1.0
>            Reporter: jichen
>            Assignee: jichen
>            Priority: Minor
>             Fix For: connectors-6.0.0
>
>
> {{{color:#1d1c1d}In our production environment, hive-phoenix connector  will 
> take nearly 30-40 minutes to generate splits for large phoenix table, the 
> table has more than 2048 regions.the performance will be worse if 
> splitByStats is enabled,   it is because in class PhoenixInputFormat, 
> function  'generateSplits' only uses one thread to generate splits for each 
> scan. My proposal is to use multi-thread to generate splits in parallel. the 
> proposal has been validated in our production environment.by  changing code 
> {color}}}{color:#1d1c1d}to generate splits  in parallel with 24 threads, the 
> time cost is reduced to 2 minutes.  {color}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to