[
https://issues.apache.org/jira/browse/HADOOP-12412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Harris updated HADOOP-12412:
------------------------------------
Attachment: HADOOP-12412.patch
Wrote a test for it, but as I mentioned I'm OOTO and on a windows laptop right
now, so basically nothing works (can't even run tests because nothing works on
windows). I could table this for the week until I'm back to a linux machine or
someone else could try applying this and seeing if it works properly. It's a
pretty serious bug IMO (though it's been around forever basically), so I feel
like it should get prioritized relatively high for getting into the next
release.
> Concurrency in FileSystem$Cache is very broken
> ----------------------------------------------
>
> Key: HADOOP-12412
> URL: https://issues.apache.org/jira/browse/HADOOP-12412
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs
> Affects Versions: 2.7.0
> Reporter: Michael Harris
> Assignee: Michael Harris
> Priority: Critical
> Attachments: HADOOP-12412.patch, HADOOP-12412.patch
>
>
> The FileSystem cache uses a mild amount of concurrency to protect the cache
> itself, but does nothing to prevent multiple of the same filesystem from
> being constructed and initialized simultaneously. At best, this leads to
> potentially expensive wasted work. At worst, as is the case for Spark, it
> can lead to deadlocks/livelocks, especially when the same configuration
> object is passed into both calls. This should be refactored to use a results
> cache approach (reference Java Concurrency in Practice chapter 5 section 6
> for an example of how to do this correctly), which will be both
> higher-performance and safer.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)