[ 
https://issues.apache.org/jira/browse/MAPREDUCE-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744331#action_12744331
 ] 

Koji Noguchi commented on MAPREDUCE-865:
----------------------------------------

Simple testing.
Created har file with 
/a/b/2000files/xaaaaa to xaadnj
and /a/b/2000files/2000files/xaaaaa to xaadnj

Created har archive called myarchive.har.

About 4500 files. 

Withot the patch, 
/usr/bin/time hadoop dfs -lsr har:///user/knoguchi/myarchive.har > /dev/null    
                  
31.72user 5.23system *1:13.19* elapsed 50%CPU (0avgtext+0avgdata 0maxresident)

with 9000 open calls to Namenode. (_masterindex and _index) and also 4500 
filestatus calls to _index (I think).

With the patch, 
23.59user 0.58system *0:22.97* elapsed 105%CPU (0avgtext+0avgdata 0maxresident)

with one _master open call and five _index open calls.
Setting -Dfs.har.indexcache.num=1 changed the number of _index open calls  to 
10 times, but elapsed  time didn't change much.


The goal of the patch is more for reducing the load/calls to the namenode than 
speeding up the 'ls' commands.

Note that since client caches the entire _masterindex and also caches each 
STORE(cache range) it reads, initial call would be slower.



> harchive: Reduce the number of open calls  to _index and _masterindex 
> ----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-865
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-865
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: harchive
>            Reporter: Koji Noguchi
>            Priority: Minor
>         Attachments: mapreduce-865-0.patch
>
>
> When I have har file with 1000 files in it, 
>    % hadoop dfs -lsr har:///user/knoguchi/myhar.har/
> would open/read/close the _index/_masterindex files 1000 times.
> This makes the client slow and add some load to the namenode as well.
> Any ways to reduce this number?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to