[jira] [Commented] (HDFS-4489) Use InodeID as as an identifier of a file in HDFS protocols and APIs

Nathan Roberts (JIRA) Fri, 26 Apr 2013 10:00:19 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643025#comment-13643025
 ]


Nathan Roberts commented on HDFS-4489:
--------------------------------------

bq. Suresh is willing to do the performance benchmark, but I am trying to 
understand where you are coming from. Yahoo and FB create very large namespaces 
by simply buying more memory and increasing the size of the heap. 

This is not always possible. Some of our namenodes are running at the maximum 
configuration for the box (maximum memory, maximum heap, near maximum 
namespace). For these clusters, upgrading to this feature will require new 
boxes. 

bq. Do you worry about cache pollution when you create 50K more files? 
I don't worry about cache pollution when I create 50K more files. What's 
important is the size of the working set. Inodes are a very popular object 
within the NN, if inodes make up a significant part of our working set, then it 
matters. I don't know whether this is the case or not, that's why I think it 
makes sense to run some benchmarks to make sure we don't see any ill-effects. 
With the introduction of YARN, the central RM is rarely the bottleneck. Now 
it's much more common for the NN to be the bottleneck of the cluster, and 
slowing down the bottleneck always needs to be looked at carefully.

bq. Given that the NN heap (many GBs) is so much larger than the cache, does 
the additional inode and inode-map size impact the overall system performance? 
Good question. Let's find out.

bq. Suresh has argued that a 24GB heap grows by 625MB. 
I was using the numbers Todd gathered where a 7G heap grew by 600MB. When we 
looked at one of our key clusters, we calculated something like 7.5% increase.

bq. Looking at the growth in memory of this feature as a percentage of the 
total heap size is a more realistic way of looking at the impact of the growth 
than the growth of an individual data structure like the inode.
Maybe.   


bq. IMHO, not having an inode-map and inode number was a serious limitation in 
the original implementation of NN. I am willing to pay for the extra memory 
given the value inode-id and inode-map brings (as described by suresh in the 
beginning of this Jira). Permissions, access time, etc added to the memory cost 
of the the NN and were accepted because of the value they bring. 
Certainly agree it is a limitation. We just need to make sure we fully quantify 
all of the costs.  

                
> Use InodeID as as an identifier of a file in HDFS protocols and APIs
> --------------------------------------------------------------------
>
>                 Key: HDFS-4489
>                 URL: https://issues.apache.org/jira/browse/HDFS-4489
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Brandon Li
>            Assignee: Brandon Li
>             Fix For: 2.0.5-beta
>
>
> The benefit of using InodeID to uniquely identify a file can be multiple 
> folds. Here are a few of them:
> 1. uniquely identify a file cross rename, related JIRAs include HDFS-4258, 
> HDFS-4437.
> 2. modification checks in tools like distcp. Since a file could have been 
> replaced or renamed to, the file name and size combination is no t reliable, 
> but the combination of file id and size is unique.
> 3. id based protocol support (e.g., NFS)
> 4. to make the pluggable block placement policy use fileid instead of 
> filename (HDFS-385).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4489) Use InodeID as as an identifier of a file in HDFS protocols and APIs

Reply via email to