[ https://issues.apache.org/jira/browse/MAPREDUCE-7241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931994#comment-16931994 ]
Zhihua Deng commented on MAPREDUCE-7241: ---------------------------------------- Changing listLocatedStatus like that is hard to debug. [~ste...@apache.org], can we do a copy here when listing status? as new BlockLocation(location) will the remove LocatedBlock info from the _location_ instance, this seemed to be the most easy way > FileInputFormat listStatus causes oom when there are lots of files in HDFS > -------------------------------------------------------------------------- > > Key: MAPREDUCE-7241 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7241 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission > Affects Versions: 2.6.1 > Reporter: Zhihua Deng > Priority: Major > Attachments: filestatus.png > > > This case sometimes sees in hive when user issues queries over all partitions > by mistakes. The file status cached when listing status could accumulate to > over 3g. After digging into the dumped memory, the LocatedBlock occupies > about 50%(sometimes over 60%) memory that retained by LocatedFileStatus, as > shows followed, > !filestatus.png! > Right now we only extract the block locations info from LocatedFileStatus, > the datanode infos(types) or block token are not taken into account. So there > is no need to cache LocatedBlock, as do like this: > BlockLocation[] blockLocations = dedup(stat.getBlockLocations()); > LocatedFileStatus shrink = new LocatedFileStatus(stat, blockLocations); > private static BlockLocation[] dup(BlockLocation[] blockLocations) { > BlockLocation[] copyLocs = new BlockLocation[blockLocations.length]; > int i = 0; > for (BlockLocation location : blockLocations) > { copyLocs[i++] = new BlockLocation(location); } > return copyLocs; > } > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org