[ 
https://issues.apache.org/jira/browse/HIVE-14953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15593627#comment-15593627
 ] 

Rajesh Balamohan commented on HIVE-14953:
-----------------------------------------

[~sershe] - It was in FileSinkOperator.handleMMTable (getMmDirectoryCandidates) 
specifically. I do not see that codepath in the latest codebase in the branch 
now. globStatus with pattern has to be replaced with {{listStatus(path, boolean 
recursive)}} and any additional filtering pattern has to be applied on client 
side. In cloud storage systems, it would be able to do prefix listing and 
reduce the number of calls significantly as compared to globStatus which 
iterates through the files one at a time in client side.

> don't use globStatus on S3 in MM tables
> ---------------------------------------
>
>                 Key: HIVE-14953
>                 URL: https://issues.apache.org/jira/browse/HIVE-14953
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Rajesh Balamohan
>            Assignee: Sergey Shelukhin
>             Fix For: hive-14535
>
>         Attachments: HIVE-14953.patch
>
>
> Need to investigate if recursive get is faster. Also, normal listStatus might 
> suffice because MM code handles directory structure in a more definite manner 
> than old code; so it knows where the files of interest are to be found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to