[
https://issues.apache.org/jira/browse/HADOOP-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592506#action_12592506
]
Konstantin Shvachko commented on HADOOP-3248:
---------------------------------------------
This is much better than the current code. In my tests it doubles the speed of
saving of the fsimage.
But I think we can and should do more here.
This is the sequence of actions proposed by this patch for storing a file name:
- File name is stored in the name-node memory as byte[].
# INode.getLocalName() creates a new String() from that byte array and returns
it.
# This string is then appended to a StringBuffer.
# Then it scans the StringBuffer once in order to calculate the resulting
length.
# Then it scans the StringBuffer again in order to convert characters into UTF8
and write them to the output stream.
Conversion on the last step is not necessary, since the original byte array is
already a utf8 representation of the file name.
Separate calculation of the length should be avoided. And the whole step with
creating String and StringBuffer should be skipped.
We should merely write byte[] directly to the output stream.
I propose to use ByteBuffer instead of StringBuffer for the full path names.
This should also let us eliminate UTF8 without changing the image format.
> Improve Namenode startup performance
> ------------------------------------
>
> Key: HADOOP-3248
> URL: https://issues.apache.org/jira/browse/HADOOP-3248
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Reporter: girish vaitheeswaran
> Assignee: dhruba borthakur
> Attachments: fastRestarts.patch, fastRestarts.patch, FSImage.patch
>
>
> One of the things that would need to be addressed as part of Namenode
> scalability is the HDFS recovery performance especially in scenarios where
> the number of files is large. There are instances where the number of files
> are in the vicinity of 20 million and in such cases the time taken for
> namenode startup is prohibitive. Here are some benchmark numbers on the time
> taken for namenode startup. These times do not include the time to process
> block reports.
> Default scenario for 20 million files with the max java heap size set to
> 14GB : 40 minutes
> Tuning various java options such as young size, parallel garbage collection,
> initial java heap size : 14 minutes
> As can be seen, 14 minutes is still a long time for the namenode to recover
> and code changes are required to bring this time down further. To this end
> some prototype optimizations were done to reduce this time. Based on some
> timing analysis saveImage and loadFSImage where the primary methods that were
> consuming most of the time. Most of the time was being spent on doing object
> allocations. The goal of the optimizations is to reduce the number of memory
> allocations as much as possible.
> Optimization 1: saveImage()
> ======================
> Avoid allocation of the UTF8 object.
> Old code
> =======
> new UTF8(fullName).write(out);
> New Code
> ========
> out.writeUTF(fullName)
> Optimization 2: saveImage()
> ======================
> Avoid object allocation of the PermissionStatus Object and the FsPermission
> object. This is to be done for Directories and for files.
> Old code
> =======
> fileINode.getPermissionStatus().write(out)
> New Code
> =========
> out.writeBytes(fileINode.getUserName())
> out.writeBytes(fileINode.getGroupName())
> out.writeShort(fileINode.getFsPermission().toShort())
> Optimization 3
> ============
> loadImage() could use the same mechanism where we would avoid allocating the
> PermissionStatus object and the FsPermission object.
> Optimization 4
> ============
> A hack was tried out to avoid the cost of object allocation from saveImage()
> where the fullName was being constructed using string concatenation. This
> optimization also helped improve performance
> Overall these optimizations helped bring down the overall startup time down
> to slightly over 7 minutes. Most of all the remaining time is now spent in
> loadFSImage() since we allocate the INode and INodeDirectory objects. Any
> further optimizations will need to focus on loadFSImage()
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.