[
https://issues.apache.org/jira/browse/MAPREDUCE-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Miklos Erdelyi updated MAPREDUCE-1363:
--------------------------------------
Summary: Spill size underestimated when using certain combiners, causing
map tasks to fail (was: Spill size underestimated when using a combiner,
causing map tasks to fail)
> Spill size underestimated when using certain combiners, causing map tasks to
> fail
> ---------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1363
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1363
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: task
> Affects Versions: 0.20.1
> Reporter: Miklos Erdelyi
>
> Spill size could get underestimated when using certain combiners, causing map
> tasks to fail when disk space is really low in some mapred.local.dir.
> When doing sortAndSpill(), MapOutputBuffer gets an output path through
> LocalDirAllocator which checks if the estimated size of the spill is
> available on any paths specified for intermediate data storage. In case a
> combiner is specified which emits key-value pairs having serialized size
> larger than the input key-value pairs' size, the size of the spill file is
> underestimated. If LocalDirAllocator selects a path for intermediate data
> storage which does not have enough space to hold the spilled records, an
> IOException is thrown and the map task fails.
> This could be avoided by either improving the estimation of the size of the
> spill (increasing it by a constant amount or by constant percentage), or
> LocalDirAllocator could take into consideration a configuration parameter
> specifying how much extra unused space should be on the path returned by
> getLocalPathForWrite (similarly to dfs.datanode.du.reserved). In case there
> is no space left on a device designated for writing intermediate data on, the
> spill could be retried on a different device (without the failure of the map
> task).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.