jichen created PHOENIX-6698:
-------------------------------

             Summary: hive-connector will take long time to generate splits for 
large phoenix tables.
                 Key: PHOENIX-6698
                 URL: https://issues.apache.org/jira/browse/PHOENIX-6698
             Project: Phoenix
          Issue Type: Improvement
          Components: hive-connector
    Affects Versions: 5.1.0
            Reporter: jichen
            Assignee: jichen
             Fix For: connectors-6.0.0


{{{color:#1d1c1d}In our production environment, hive-phoenix connector  will 
take nearly 30-40 minutes to generate splits for large phoenix table, the table 
has more than 2048 regions.the performance will be worse if splitByStats is 
enabled,   it is because in class PhoenixInputFormat, function  
'generateSplits' only uses one thread to generate splits for each scan. My 
proposal is to use multi-thread to generate splits in parallel. the proposal 
has been validated in our production environment.by  changing code 
{color}}}{color:#1d1c1d}{color:#1d1c1d}to generate splits  in parallel with 24 
threads, the time cost is reduced to 3-4 minutes.  
{color}{color}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to