[jira] [Commented] (MAPREDUCE-7241) FileInputFormat listStatus causes oom when there are lots of files in HDFS

Zhihua Deng (Jira) Tue, 17 Sep 2019 20:00:07 -0700


    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931994#comment-16931994
 ]


Zhihua Deng commented on MAPREDUCE-7241:
----------------------------------------

Changing listLocatedStatus like that is hard to debug.  [~ste...@apache.org],  
can we do a copy here when listing status? as new BlockLocation(location) will 
the remove LocatedBlock info from the _location_ instance, this seemed to be 
the most easy way

> FileInputFormat listStatus causes oom when there are lots of files in HDFS
> --------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-7241
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7241
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: job submission
>    Affects Versions: 2.6.1
>            Reporter: Zhihua Deng
>            Priority: Major
>         Attachments: filestatus.png
>
>
> This case sometimes sees in hive when user issues queries over all partitions 
> by mistakes. The file status cached when listing status could accumulate to 
> over 3g.  After digging into the  dumped memory, the LocatedBlock occupies 
> about 50%(sometimes over 60%) memory that retained by LocatedFileStatus, as 
> shows followed,
> !filestatus.png!
> Right now we only extract the block locations info from LocatedFileStatus,  
> the datanode infos(types) or block token are not taken into account. So there 
> is no need to cache LocatedBlock, as do like this:
> BlockLocation[] blockLocations = dedup(stat.getBlockLocations());
>  LocatedFileStatus shrink = new LocatedFileStatus(stat, blockLocations);
> private static BlockLocation[] dup(BlockLocation[] blockLocations) {
>      BlockLocation[] copyLocs = new BlockLocation[blockLocations.length];
>      int i = 0;
>      for (BlockLocation location : blockLocations)
> {         copyLocs[i++] = new BlockLocation(location);     }
>     return copyLocs;
>  }
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7241) FileInputFormat listStatus causes oom when there are lots of files in HDFS

Reply via email to