[
https://issues.apache.org/jira/browse/MAPREDUCE-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Todd Lipcon updated MAPREDUCE-1347:
-----------------------------------
Status: Open (was: Patch Available)
I don't think this double-checked locking is safe. One thread may be calling
put() on the map while another is calling get() outside the synchronized
section. This can cause the getter to throw an exception or sometimes even
return the wrong entry.
> Missing synchronization in MultipleOutputFormat
> -----------------------------------------------
>
> Key: MAPREDUCE-1347
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1347
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Affects Versions: 0.21.0, 0.20.2, 0.22.0
> Reporter: Todd Lipcon
> Assignee: Harsh J Chouraria
> Attachments: mapreduce.1347.r1.diff
>
>
> MultipleOutputFormat's RecordWriter implementation doesn't use
> synchronization when accessing the recordWriters member. When using
> multithreaded mappers or reducers, this can result in problems where two
> threads will both try to create the same file, causing
> AlreadyBeingCreatedException. Doing this more fine-grained than just
> synchronizing the whole method is probably a good idea, so that multithreaded
> mappers can actually achieve parallelism writing into separate output streams.
> From what I can tell, the new API's MultipleOutputs seems not to have this
> issue.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira