[
https://issues.apache.org/jira/browse/HBASE-28222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17791815#comment-17791815
]
Bryan Beaudreault commented on HBASE-28222:
-------------------------------------------
I've spent a lot of time (too much) thinking about this, and I think the best
course of action is to revert HBASE-12819. I don't think the original premise
of that Jira really holds up:
* There was limited evidence presented in the jira. The author just noted that
one thing they noticed was not closing FileSystems, but was that the cause of
the OOM?
* The author only closed 2 FileSystem objects. ExportSnapshot, a tool meant to
be run via the command line, would have to be run in the same process many many
times for a leak of just 2 objects to build up to a problem.
* I instrumented FileSystem.get() and noted that a simple small ExportSnapshot
run, without verifySnapshot, creates 122 FileSystem objects. Most of those are
created within the MapReduce framework, and not closed.
* Further, when verifySnapshot is in the picture, it easily creates many
thousand more FileSystem objects even for a small table – 2-3 per StoreFile.
The core problem is the ubiquity of code like:
{code:java}
FileSystem fs = path.getFileSystem(conf);
path.makeQualified(fs.getUri(), fs.getWorkingDirectory());{code}
There are other examples where path.getFileSystem is called and treated as
throw-away as well. When caching is disabled, these become incredibly
problematic. This is really a design problem with the FileSystem cache in
Hadoop. There are numerous jiras about that cache in the HADOOP project. Even
if they fixed it, we support multiple versions of hadoop going back to 2.x.
So HBASE-12819 solved the leak of 2 FileSystem objects, at the expense of
leaking many thousands of objects. We should revert it so that the cache is
re-enabled. At that point, all of those path.getFileSystem calls will use
cached instances. We will will still leak a small handful of objects, but those
should not be problematic.
If a user wants to run ExportSnapshot multiple times in a process, they
probably should call FileSystem.closeAll() between each run. Unfortunately due
to how the FileSystem cache system works, this is only safe to do if no other
threads have references to a cached FileSystem.
> Leak in ExportSnapshot during verifySnapshot on S3A
> ---------------------------------------------------
>
> Key: HBASE-28222
> URL: https://issues.apache.org/jira/browse/HBASE-28222
> Project: HBase
> Issue Type: Bug
> Reporter: Bryan Beaudreault
> Priority: Major
>
> Each S3AFileSystem creates an S3AInstrumentation and various metrics sources,
> with no real way to disable that. In HADOOP-18526, a bug was fixed so that
> these are not leaked. But in order to use that, you must call
> S3AFileSystem.close() when done.
> In ExportSnapshot, ever since HBASE-12819 we set fs.impl.disable.cache to
> true. It looks like that was added in order to prevent conflicting calls to
> close() between mapper and main thread when running in a single JVM.
> When verifySnapshot is enabled, SnapshotReferenceUtil.verifySnapshot iterates
> all storefiles (could be many thousands) and calls
> SnapshotReferenceUtil.verifyStoreFile on them. verifyStoreFile makes a number
> of static calls which end up in CommonFSUtils.getRootDir, which does
> Path.getFileSystem().
> Since the FS cache is disabled, every single call to Path.getFileSystem()
> creates a new FileSystem instance. That FS is short lived, and gets GC'd. But
> in the case of S3AFileSystem, this leaks all of the metrics stuff.
> We have two easy possible fixes:
> # Only set fs.impl.disable.cache when running hadoop in local mode, since
> that was the original problem.
> # When calling verifySnapshot, create a new Configuration which does not
> include the fs.impl.disable.cache setting.
> I tested out #2 in my environment and it fixed the leak.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)