[ 
https://issues.apache.org/jira/browse/HADOOP-17428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17302431#comment-17302431
 ] 

Steve Loughran commented on HADOOP-17428:
-----------------------------------------

I turns out that Hive does use this for some of its calculations of size of 
unmanaged tables, so its performance *does* matter -at least until/unless we 
can move Hive off this. So apparently does spark.

Personally, I don't think they should be using it as it is doing an expensive 
treewalk, but they probably aren't aware of its cost.

Could abfs do some of the treewalk in parallel? I think for s3a I'd go to the 
deep listing (listFiles(Recursive=true) and make up some directory count number

> ABFS: Implementation for getContentSummary
> ------------------------------------------
>
>                 Key: HADOOP-17428
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17428
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/azure
>    Affects Versions: 3.3.0
>            Reporter: Sumangala Patki
>            Assignee: Sumangala Patki
>            Priority: Major
>
> Adds implementation for HDFS method getContentSummary, which takes in a Path 
> argument and returns details such as file/directory count and space utilized 
> under that path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to