[ 
https://issues.apache.org/jira/browse/HIVE-15368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15730097#comment-15730097
 ] 

Sergey Shelukhin commented on HIVE-15368:
-----------------------------------------

pool.shutdown/shutdownNow seems to be called twice (before the future.get-s and 
in finally)

I am not sure the change is valid... it seems like we don't clean up MM 
directories at all now if we found all committed files in the recursive variant 
of get... am I missing something? we still need to look at files that are NOT 
committed, and delete them.

Also, the logic is now kind of non-transparent. globStatus path returns 
directories, but the recursive path actually modifies the committed set on the 
fly and then directories are ignored. Perhaps the paths should be separated on 
higher level than get... method.

> consider optimizing Utilities::handleMmTableFinalPath
> -----------------------------------------------------
>
>                 Key: HIVE-15368
>                 URL: https://issues.apache.org/jira/browse/HIVE-15368
>             Project: Hive
>          Issue Type: Sub-task
>    Affects Versions: hive-14535
>            Reporter: Rajesh Balamohan
>         Attachments: HIVE-15368.branch.14535.1.patch
>
>
> Branch: hive-14535
> https://github.com/apache/hive/blob/hive-14535/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L4049
> When running "insert overwrite...on partitioned table" with 2000+ partitions, 
> good amount of time (~245 seconds) was spent in iterating every mmDirectory 
> entry and checking its file listings in S3. Creating this jira to consider 
> optimizing this codepath, as information from {{getMmDirectoryCandidates}} 
> could be used in terms of reducing the number of times S3 needs to be 
> contacted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to