[ 
https://issues.apache.org/jira/browse/HIVE-9367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276435#comment-14276435
 ] 

Jimmy Xiang commented on HIVE-9367:
-----------------------------------

With the FileStatus, we don't need to go to NN to get the FileStatus again, 
since FileStatus already has info about if the path is a file or dir. 
Originally, in getDirIndices, we get FileStatus again, which is an extra call 
for each file. So this patch saves us a call to get FileStatus for each file.

> CombineFileInputFormatShim#getDirIndices is expensive
> -----------------------------------------------------
>
>                 Key: HIVE-9367
>                 URL: https://issues.apache.org/jira/browse/HIVE-9367
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Jimmy Xiang
>            Assignee: Jimmy Xiang
>         Attachments: HIVE-9367.1.patch
>
>
> [~lirui] found out that we spent quite some time on 
> CombineFileInputFormatShim#getDirIndices. Looked into it and it seems to me 
> we should be able to get rid of this method completely if we can enhance 
> CombineFileInputFormatShim a little.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to