[ https://issues.apache.org/jira/browse/HDFS-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13615575#comment-13615575 ]
Todd Lipcon commented on HDFS-4489: ----------------------------------- bq. byte[] name - I assume typically ~56 bytes for this. That is (16 bytes object overhead, 8 byte length + bytes that make up file name, say 32) According to your comment here: https://issues.apache.org/jira/browse/HDFS-1110?focusedCommentId=12861548&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12861548 a typical image with ~50M files will only need ~5M unique name byte[] objects, so I think it's unfair to count the above against the inode. I think you're also adding an extra 8 bytes on the arrays -- the array length as I understand it is a field within the 16byte object header (occupying the second half of the klassId field). Regardless, this seems like something that's very easy to test rather than try to solve analytically. Do you have results for the additional memory overhead of this map on a large production image? If it's truly 3-5%, seems reasonably, but I'm afraid it may look closer to 10+% in practice. > Use InodeID as as an identifier of a file in HDFS protocols and APIs > -------------------------------------------------------------------- > > Key: HDFS-4489 > URL: https://issues.apache.org/jira/browse/HDFS-4489 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Reporter: Brandon Li > Assignee: Brandon Li > > The benefit of using InodeID to uniquely identify a file can be multiple > folds. Here are a few of them: > 1. uniquely identify a file cross rename, related JIRAs include HDFS-4258, > HDFS-4437. > 2. modification checks in tools like distcp. Since a file could have been > replaced or renamed to, the file name and size combination is no t reliable, > but the combination of file id and size is unique. > 3. id based protocol support (e.g., NFS) > 4. to make the pluggable block placement policy use fileid instead of > filename (HDFS-385). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira