[
https://issues.apache.org/jira/browse/MAPREDUCE-7241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Darrell Lowe reassigned MAPREDUCE-7241:
---------------------------------------------
Assignee: Zhihua Deng
> FileInputFormat listStatus with less memory footprint
> -----------------------------------------------------
>
> Key: MAPREDUCE-7241
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7241
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: job submission
> Affects Versions: 2.6.1
> Reporter: Zhihua Deng
> Assignee: Zhihua Deng
> Priority: Major
> Attachments: MAPREDUCE-7241.03.patch, MAPREDUCE-7241.04.patch,
> MAPREDUCE-7241.05.patch, MAPREDUCE-7241.trunk.02.patch,
> MAPREDUCE-7241.trunk.patch, filestatus.png
>
>
> This case sometimes sees in hive when user issues queries over all partitions
> by mistakes. The file status cached when listing status could accumulate to
> over 3g. After digging into the dumped memory, the LocatedBlock occupies
> about 50%(sometimes over 60%) memory that retained by LocatedFileStatus, as
> shows followed,
> !filestatus.png!
> Right now we only extract the block locations info from LocatedFileStatus,
> the datanode infos(types) or block token are not taken into account. So there
> is no need to cache LocatedBlock, as do like this:
> BlockLocation[] blockLocations = dedup(stat.getBlockLocations());
> LocatedFileStatus shrink = new LocatedFileStatus(stat, blockLocations);
> private static BlockLocation[] dup(BlockLocation[] blockLocations) {
> BlockLocation[] copyLocs = new BlockLocation[blockLocations.length];
> int i = 0;
> for (BlockLocation location : blockLocations)
> { copyLocs[i++] = new BlockLocation(location); }
> return copyLocs;
> }
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]