[
https://issues.apache.org/jira/browse/HADOOP-17362?focusedWorklogId=511523&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-511523
]
ASF GitHub Bot logged work on HADOOP-17362:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 13/Nov/20 20:22
Start Date: 13/Nov/20 20:22
Worklog Time Spent: 10m
Work Description: jbrennan333 merged pull request #2444:
URL: https://github.com/apache/hadoop/pull/2444
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 511523)
Time Spent: 1h 10m (was: 1h)
> Doing hadoop ls on Har file triggers too many RPC calls
> -------------------------------------------------------
>
> Key: HADOOP-17362
> URL: https://issues.apache.org/jira/browse/HADOOP-17362
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs
> Reporter: Ahmed Hussein
> Assignee: Ahmed Hussein
> Priority: Major
> Labels: pull-request-available
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> [~daryn] has noticed that Invoking hadoop ls on HAR is taking too much of
> time.
> The har system has multiple deficiencies that significantly impacted
> performance:
> # Parsing the master index references ranges within the archive index. Each
> range required re-opening the hdfs input stream and seeking to the same
> location where it previously stopped.
> # Listing a har stats the archive index for every "directory". The per-call
> cache used a unique key for each stat, rendering the cache useless and
> significantly increasing memory pressure.
> # Determining the children of a directory scans the entire archive contents
> and filters out children. The cached metadata already stores the exact child
> list.
> # Globbing a har's contents resulted in unnecessary stats for every leaf path.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]