[jira] [Commented] (HDFS-4489) Use InodeID as as an identifier of a file in HDFS protocols and APIs

Suresh Srinivas (JIRA) Wed, 27 Mar 2013 12:51:16 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13615665#comment-13615665
 ]


Suresh Srinivas commented on HDFS-4489:
---------------------------------------

bq. I think you're also adding an extra 8 bytes on the arrays – the array 
length as I understand it is a field within the 16byte object header (occupying 
the second half of the klassId field).
If you have an authoritative source, please send me that. I cannot understand 
how 16 byte object header have spare of say possible 8 bytes to track array 
length. Some of of my previous instrumentation had led me to conclude the the 
array length is 4 bytes for 32bit JVM and 8 bytes for 64 bit JVM. See 
discussion here - 
http://www.javamex.com/tutorials/memory/object_memory_usage.shtml.

bq. a typical image with ~50M files will only need ~5M unique name byte[] 
objects, so I think it's unfair to count the above against the inode.
That is a fair point. But my own inodes occupies 1/3rd of java heap is also an 
approximation and in practice I would think it inodes occupy smaller than that.

I would like to run an experiment on a large production image. But I do not 
have ready access to it and will have to spend time getting to it. Do you have 
any?

bq. but I'm afraid it may look closer to 10+% in practice.
I do not think it will be close to 10%, but lets say it is. I do not see much 
issues with it. When we did some of the optimizations earlier, we were not sure 
how JVM would do if goes closes to 64G and hence wanted to keep the heap size 
down. But since then many large installations have successfully, without any 
issues gone beyond that size. Smaller installations should be able to spare, 
say, 10% extra heap. But if that is not acceptable, here are the alternatives I 
see:
# Add configuration options to turn this feature off. Not instantiating GSet 
will reduce the overhead by 1/3rd. This is simple to do.
# Make more optimizations at the expense of code complexity. I would like to 
avoid this. But if it is deemed very important, with some optimizations, we can 
get it close to 0%.

                
> Use InodeID as as an identifier of a file in HDFS protocols and APIs
> --------------------------------------------------------------------
>
>                 Key: HDFS-4489
>                 URL: https://issues.apache.org/jira/browse/HDFS-4489
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Brandon Li
>            Assignee: Brandon Li
>
> The benefit of using InodeID to uniquely identify a file can be multiple 
> folds. Here are a few of them:
> 1. uniquely identify a file cross rename, related JIRAs include HDFS-4258, 
> HDFS-4437.
> 2. modification checks in tools like distcp. Since a file could have been 
> replaced or renamed to, the file name and size combination is no t reliable, 
> but the combination of file id and size is unique.
> 3. id based protocol support (e.g., NFS)
> 4. to make the pluggable block placement policy use fileid instead of 
> filename (HDFS-385).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4489) Use InodeID as as an identifier of a file in HDFS protocols and APIs

Reply via email to