[ 
https://issues.apache.org/jira/browse/HDFS-13693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16951877#comment-16951877
 ] 

Stephen O'Donnell commented on HDFS-13693:
------------------------------------------

I came across a problem that this change introduces.

I have a tool to generate an XML fsimage which generates a directory structure 
like:
{code:java}
/generated/level1_0/level2_0
/generated/level1_0/level2_1
...
/generated/level1_9/level2_0
/generated/level1_10/level2_0
...
/generate/level1_20/level2_0
...{code}
The natural way to generate this XML structure with two nested loops, results 
in directory entries where the sub-directory names are not sorted 
alphabetically.

Then when you convert this XML using OIV and load it into a namenode, the 
directory structure gets broken as this change makes the loading code expect 
the directory entries to be in sorted order, and they are not.

While my tool is strictly non-production, there could be a reason someone may 
need to edit some XML to fix an image, causing the directory to no longer be 
sorted and hence cause problems. This change puts a limitation on anything that 
might generate an fsimage, to ensure all the directory entries are emitted in 
sorted order. This may not be a big problem, but could cause some strange 
issues that are hard to track down if you don't know about this change.

> Remove unnecessary search in INodeDirectory.addChild during image loading
> -------------------------------------------------------------------------
>
>                 Key: HDFS-13693
>                 URL: https://issues.apache.org/jira/browse/HDFS-13693
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: zhouyingchao
>            Assignee: Lisheng Sun
>            Priority: Major
>             Fix For: 3.3.0, 3.1.4, 3.2.2
>
>         Attachments: HDFS-13693-001.patch, HDFS-13693-002.patch, 
> HDFS-13693-003.patch, HDFS-13693-004.patch, HDFS-13693-005.patch
>
>
> In FSImageFormatPBINode.loadINodeDirectorySection, all child INodes are added 
> to their parent INode's map one by one. The adding procedure will search a 
> position in the parent's map and then insert the child to the position. 
> However, during image loading, the search is unnecessary since the insert 
> position should always be at the end of the map given the sequence they are 
> serialized on disk.
> Test this patch against a fsimage of a 70PB  cluster (200million files and 
> 300million blocks), the image loading time be reduced from 1210 seconds to 
> 1138 seconds.So it can reduce up to about 10% of time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to