[jira] [Commented] (MAPREDUCE-7241) FileInputFormat listStatus with less memory footprint

Hudson (Jira) Wed, 01 Apr 2020 06:20:39 -0700


    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072768#comment-17072768
 ]


Hudson commented on MAPREDUCE-7241:
-----------------------------------

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18108 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/18108/])
MAPREDUCE-7241. FileInputFormat listStatus with less memory footprint. (jlowe: 
rev c613296dc85ac7b22c171c84f578106b315cc012)
* (edit) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/LocatedFileStatusFetcher.java
* (edit) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestFileInputFormat.java
* (edit) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java
* (edit) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java


> FileInputFormat listStatus with less memory footprint
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-7241
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7241
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: job submission
>    Affects Versions: 2.6.1
>            Reporter: Zhihua Deng
>            Assignee: Zhihua Deng
>            Priority: Major
>             Fix For: 3.4.0
>
>         Attachments: MAPREDUCE-7241.03.patch, MAPREDUCE-7241.04.patch, 
> MAPREDUCE-7241.05.patch, MAPREDUCE-7241.trunk.02.patch, 
> MAPREDUCE-7241.trunk.patch, filestatus.png
>
>
> This case sometimes sees in hive when user issues queries over all partitions 
> by mistakes. The file status cached when listing status could accumulate to 
> over 3g.  After digging into the  dumped memory, the LocatedBlock occupies 
> about 50%(sometimes over 60%) memory that retained by LocatedFileStatus, as 
> shows followed,
> !filestatus.png!
> Right now we only extract the block locations info from LocatedFileStatus,  
> the datanode infos(types) or block token are not taken into account. So there 
> is no need to cache LocatedBlock, as do like this:
> BlockLocation[] blockLocations = dedup(stat.getBlockLocations());
>  LocatedFileStatus shrink = new LocatedFileStatus(stat, blockLocations);
> private static BlockLocation[] dup(BlockLocation[] blockLocations) {
>      BlockLocation[] copyLocs = new BlockLocation[blockLocations.length];
>      int i = 0;
>      for (BlockLocation location : blockLocations)
> {         copyLocs[i++] = new BlockLocation(location);     }
>     return copyLocs;
>  }
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7241) FileInputFormat listStatus with less memory footprint

Reply via email to