[
https://issues.apache.org/jira/browse/HDFS-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16155242#comment-16155242
]
Steve Loughran commented on HDFS-1068:
--------------------------------------
S3Guard/HADOOP-13345 is really for caching shared state in dynamodb, rather
than talking to the (very slow) S3 store. But you can run a metadata store
in-VM, which we do purely for testing...it gets inconsistent as soon as anyone
changes the store.
One thing we have done to try and improve perf is (a) deprecate
FileSystem.exists(), .isDir() and .isFile(), pushing people to using
getFileStatus instead. If you look through downstream code to see how those
probes get used, you see a lot of things like
{code}
if (fs.exists(path)) fs.delete(path)
{code}
or
{code}
if (fs.exists(path)) fs.mkdirs(path)
{code}
or worst of all
{code}
if (fs.exists(path)) {
FileStatus stat = fs.getFileStatus(path)
...
}
{code}
essentially: calling getFilestatus, discarding the result and then calling an
operation which does the checks & downgrades if fails, or even duplicating the
work.
Every IPC call is sacred, especially if blobstores convert that to an HTTP
request
> Reduce NameNode GC by reusing HdfsFileStatus objects in RPC handlers
> --------------------------------------------------------------------
>
> Key: HDFS-1068
> URL: https://issues.apache.org/jira/browse/HDFS-1068
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: namenode
> Reporter: Hairong Kuang
> Assignee: Zhe Zhang
> Attachments: HDFS-1068.00.patch, Screen Shot 2017-08-31 at 3.58.15
> PM.png
>
>
> In our production clusters, getFileInfo is the most frequent operation that
> hit NameNode, and its frequency is highly correlated to the GC behavior.
> HDFS-946 has already reduced the amount of heap/cpu and the number of
> temporary objects for each getFileInfo call. Yet another improvement is to
> avoid creation of a HdfsFileStatus object for each getFileInfo call. Instead
> each RPC handler can have a thread local HdfsFileStatus object. Each
> getFileInfo call simply sets values for all fields of the thread local
> HdfsFileStatus object.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]