[jira] [Commented] (HDFS-4489) Use InodeID as as an identifier of a file in HDFS protocols and APIs

Suresh Srinivas (JIRA) Tue, 09 Apr 2013 10:56:18 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13626881#comment-13626881
 ]


Suresh Srinivas commented on HDFS-4489:
---------------------------------------

bq. The GSet used for InodeID to INode map is also semi-fixed. Is it allocated 
similarly to BlocksMap?
Yes. Please see the patch in HDFS-4434. About 1% of heap is used for the GSet.

bq. Simply saying the overhead is insignificant won't convince users. We should 
explain why the benefit from having this feature justifies the overhead. I 
don't think on/off switch is necessary.
I think the assertion here is not overhead is insignificant. Depending on 
details of how the namespace of a system is laid out, I would think this would 
be anywhere from 2 to 5%.

As far the benefits, in the main description I laid this out:

---
This helps in several use cases:
# HDFS can evolve to support ID based protocols such as NFS. We plan to add an 
experimental NFS V3 gateway to HDFS using this mechanism. Will post a github 
link soon.
# InodeID can be used by the tools to track a single instance of a file, for 
cacheing data or tracking and checking for modification based on INodeID, in 
tools like distcp.
# Path cannot identify a unique instance of a file. This causes issues as 
described in HDFS-4258 and HDFS-4437. It has also been a requirement of many 
other jiras such as HDFS-385.
# Using InodeID as an identifier instead of path can be more efficient than 
path bases accesses.
---

bq. We have a namenode which will not work well if we upgrade to a release with 
this feature since it will need extra 4-6GB for the steady-state operation. 
Even if it could absorb the extra memory requirement, we would have to tell 
users that the namespace limit is X% worse.
Is this because namenode does not have RAM? With this change, it is expected 
that NN is allocated more memory, say 5%. If this is done I am not sure why 
users should be told namespace limit is X% worse?

My rationale, repeating what I said earlier is,  machines are becoming 
available with more RAM. Adding 5% JVM heap should not be a problem. In fact 
most of the namenodes are configured with enough head room already and might 
not even need a change. But if this is a big concern, I am okay making 
additional change to bring down the memory consumption close to zero. 


                
> Use InodeID as as an identifier of a file in HDFS protocols and APIs
> --------------------------------------------------------------------
>
>                 Key: HDFS-4489
>                 URL: https://issues.apache.org/jira/browse/HDFS-4489
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Brandon Li
>            Assignee: Brandon Li
>
> The benefit of using InodeID to uniquely identify a file can be multiple 
> folds. Here are a few of them:
> 1. uniquely identify a file cross rename, related JIRAs include HDFS-4258, 
> HDFS-4437.
> 2. modification checks in tools like distcp. Since a file could have been 
> replaced or renamed to, the file name and size combination is no t reliable, 
> but the combination of file id and size is unique.
> 3. id based protocol support (e.g., NFS)
> 4. to make the pluggable block placement policy use fileid instead of 
> filename (HDFS-385).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4489) Use InodeID as as an identifier of a file in HDFS protocols and APIs

Reply via email to