[jira] [Commented] (HDFS-4489) Use InodeID as as an identifier of a file in HDFS protocols and APIs

Kihwal Lee (JIRA) Tue, 09 Apr 2013 10:44:16 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13626868#comment-13626868
 ]


Kihwal Lee commented on HDFS-4489:
----------------------------------

bq. Please look at the overall increase in memory usage instead of increase 
over used memory. 
Your point would be valid only if the overhead was entirely a fixed amount 
(e.g. GSet).  Since the extra memory consumption increases as the size of 
namespace grows, factoring the arbitrary max heap size into this can be 
misleading.  But I agree that the 9% figure does not have an absolute meaning 
either. If the inode-to-block ratio is different, the number will be different. 
For the clusters I have seen, it will be a lower number. The GSet used for 
InodeID to INode map is also semi-fixed. Is it allocated similarly to 
BlocksMap? 

In any case, I would not call this insignificant. We have a namenode which will 
not work well if we upgrade to a release with this feature since it will need 
extra 4-6GB for the steady-state operation. Even if it could absorb the extra 
memory requirement, we would have to tell users that the namespace limit is X% 
worse.  

Simply saying the overhead is insignificant won't convince users. We should 
explain why the benefit from having this feature justifies the overhead.  I 
don't think on/off switch is necessary. 
                
> Use InodeID as as an identifier of a file in HDFS protocols and APIs
> --------------------------------------------------------------------
>
>                 Key: HDFS-4489
>                 URL: https://issues.apache.org/jira/browse/HDFS-4489
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Brandon Li
>            Assignee: Brandon Li
>
> The benefit of using InodeID to uniquely identify a file can be multiple 
> folds. Here are a few of them:
> 1. uniquely identify a file cross rename, related JIRAs include HDFS-4258, 
> HDFS-4437.
> 2. modification checks in tools like distcp. Since a file could have been 
> replaced or renamed to, the file name and size combination is no t reliable, 
> but the combination of file id and size is unique.
> 3. id based protocol support (e.g., NFS)
> 4. to make the pluggable block placement policy use fileid instead of 
> filename (HDFS-385).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4489) Use InodeID as as an identifier of a file in HDFS protocols and APIs

Reply via email to