Thank you, all!  This is quite helpful.

We have been arguing how to handle this issue across a growing application. Unfortunately the Hadoop FileSystem java doc should say all this but doesn't!

Kevin

On 05/22/2014 01:48 PM, Aaron Davidson wrote:
In Spark 0.9.0 and 0.9.1, we stopped using the FileSystem cache correctly,
and we just recently resumed using it in 1.0 (and in 0.9.2) when this issue
was fixed: https://issues.apache.org/jira/browse/SPARK-1676

Prior to this fix, each Spark task created and cached its own FileSystems
due to a bug in how the FS cache handles UGIs. The big problem that arose
was that these FileSystems were never closed, so they just kept piling up.
There were two solutions we considered, with the following effects: (1)
Share the FS cache among all tasks and (2) Each task effectively gets its
own FS cache, and closes all of its FSes after the task completes.

We chose solution (1) for 3 reasons:
  - It does not rely on the behavior of a bug in HDFS.
  - It is the most performant option.
  - It is most consistent with the semantics of the (albeit broken) FS cache.

Since this behavior was changed in 1.0, it could be considered a
regression. We should consider the exact behavior we want out of the FS
cache. For Spark's purposes, it seems fine to cache FileSystems across
tasks, as Spark does not close FileSystems. The issue that comes up is that
user code which uses FileSystem.get() but then closes the FileSystem can
screw up Spark processes which were using that FileSystem. The workaround
for users would be to use FileSystem.newInstance() if they want full
control over the lifecycle of their FileSystems.


On Thu, May 22, 2014 at 12:06 PM, Colin McCabe <cmcc...@alumni.cmu.edu>wrote:

The FileSystem cache is something that has caused a lot of pain over the
years.  Unfortunately we (in Hadoop core) can't change the way it works now
because there are too many users depending on the current behavior.

Basically, the idea is that when you request a FileSystem with certain
options with FileSystem#get, you might get a reference to an FS object that
already exists, from our FS cache cache singleton.  Unfortunately, this
also means that someone else can change the working directory on you or
close the FS underneath you.  The FS is basically shared mutable state, and
you don't know whom you're sharing with.

It might be better for Spark to call FileSystem#newInstance, which bypasses
the FileSystem cache and always creates a new object.  If Spark can hang on
to the FS for a while, it can get the benefits of caching without the
downsides.  In HDFS, multiple FS instances can also share things like the
socket cache between them.

best,
Colin


On Thu, May 22, 2014 at 10:06 AM, Marcelo Vanzin <van...@cloudera.com
wrote:
Hi Kevin,

On Thu, May 22, 2014 at 9:49 AM, Kevin Markey <kevin.mar...@oracle.com>
wrote:
The FS closed exception only effects the cleanup of the staging
directory,
not the final success or failure.  I've not yet tested the effect of
changing my application's initialization, use, or closing of
FileSystem.
Without going and reading more of the Spark code, if your app is
explicitly close()'ing the FileSystem instance, it may be causing the
exception. If Spark is caching the FileSystem instance, your app is
probably closing that same instance (which it got from the HDFS
library's internal cache).

It would be nice if you could test that theory; it might be worth
knowing that's the case so that we can tell people not to do that.

--
Marcelo


Reply via email to