[ https://issues.apache.org/jira/browse/HDFS-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13626881#comment-13626881 ]
Suresh Srinivas commented on HDFS-4489: --------------------------------------- bq. The GSet used for InodeID to INode map is also semi-fixed. Is it allocated similarly to BlocksMap? Yes. Please see the patch in HDFS-4434. About 1% of heap is used for the GSet. bq. Simply saying the overhead is insignificant won't convince users. We should explain why the benefit from having this feature justifies the overhead. I don't think on/off switch is necessary. I think the assertion here is not overhead is insignificant. Depending on details of how the namespace of a system is laid out, I would think this would be anywhere from 2 to 5%. As far the benefits, in the main description I laid this out: --- This helps in several use cases: # HDFS can evolve to support ID based protocols such as NFS. We plan to add an experimental NFS V3 gateway to HDFS using this mechanism. Will post a github link soon. # InodeID can be used by the tools to track a single instance of a file, for cacheing data or tracking and checking for modification based on INodeID, in tools like distcp. # Path cannot identify a unique instance of a file. This causes issues as described in HDFS-4258 and HDFS-4437. It has also been a requirement of many other jiras such as HDFS-385. # Using InodeID as an identifier instead of path can be more efficient than path bases accesses. --- bq. We have a namenode which will not work well if we upgrade to a release with this feature since it will need extra 4-6GB for the steady-state operation. Even if it could absorb the extra memory requirement, we would have to tell users that the namespace limit is X% worse. Is this because namenode does not have RAM? With this change, it is expected that NN is allocated more memory, say 5%. If this is done I am not sure why users should be told namespace limit is X% worse? My rationale, repeating what I said earlier is, machines are becoming available with more RAM. Adding 5% JVM heap should not be a problem. In fact most of the namenodes are configured with enough head room already and might not even need a change. But if this is a big concern, I am okay making additional change to bring down the memory consumption close to zero. > Use InodeID as as an identifier of a file in HDFS protocols and APIs > -------------------------------------------------------------------- > > Key: HDFS-4489 > URL: https://issues.apache.org/jira/browse/HDFS-4489 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Reporter: Brandon Li > Assignee: Brandon Li > > The benefit of using InodeID to uniquely identify a file can be multiple > folds. Here are a few of them: > 1. uniquely identify a file cross rename, related JIRAs include HDFS-4258, > HDFS-4437. > 2. modification checks in tools like distcp. Since a file could have been > replaced or renamed to, the file name and size combination is no t reliable, > but the combination of file id and size is unique. > 3. id based protocol support (e.g., NFS) > 4. to make the pluggable block placement policy use fileid instead of > filename (HDFS-385). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira