[ 
https://issues.apache.org/jira/browse/HIVE-23521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113746#comment-17113746
 ] 

Rajesh Balamohan commented on HIVE-23521:
-----------------------------------------

Batching is one option, but need to start considering data copy as well.

For metadata only case, it ended up running for 3.5 hours for 10K partitions. 
With the patch, it completes in 350-380 seconds!.

> REPL: Optimise partition loading during bootstrap
> -------------------------------------------------
>
>                 Key: HIVE-23521
>                 URL: https://issues.apache.org/jira/browse/HIVE-23521
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>            Priority: Major
>         Attachments: HIVE-23521.1.patch
>
>
> When bootstrapping with large "REPL dump" with ~10K partitions, it starts 
> executing "addPartition" in sequential manner and takes very long time as it 
> communicates with HMS/registers partition etc for every call.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java#L399]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java#L165]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java#L210]
> When bootstrap loading has to deal with DDL, it would be good to collate all 
> partitions in single call to HMS. This would help in reducing overall runtime.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to