[
https://issues.apache.org/jira/browse/HDFS-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881498#action_12881498
]
Konstantin Shvachko commented on HDFS-1140:
-------------------------------------------
Some review comment:
# {{FSImage.isParent(String, String)}} is not used, please remove.
# Could you please add separators between the methods and javaDoc descriptions
for the new methods if possible.
# {{INode.getPathFromComponents()}} should be {{DFSUtil.byteArray2String()}}.
# {{TestPathComponents}} should use junit 4 style rather than junit 3.
# I'd advise to reuse {{U_STR}} instead of allocating {{DeprecatedUTF8 buff}}
directly in {{FSImage.loadFSImage()}}.
In order to do that you can provide a convenience method similar to
{{readString()}} or {{readBytes()}}:
{code}
static byte[][] readPathComponents(DataInputStream in) throws IOException {
U_STR.readFields(in);
return DFSUtil.bytes2byteArray(U_STR.getBytes(), U_STR.getLength(),
(byte)Path.SEPARATOR_CHAR);
}
{code}
The idea was to remove DeprecatedUTF8 at some point, so it is better to keep
this stuff in one place right after the declaration of U_STR.
# It does not look like {{FSDirectory.addToParent(String src ...)}} is used
anywhere anymore. Could you please verify and remove it if so.
# Same with {{INodeDirectory.addToParent(String path, ...)}} - can we eliminate
it too?
> Speedup INode.getPathComponents
> -------------------------------
>
> Key: HDFS-1140
> URL: https://issues.apache.org/jira/browse/HDFS-1140
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Dmytro Molkov
> Assignee: Dmytro Molkov
> Priority: Minor
> Attachments: HDFS-1140.2.patch, HDFS-1140.3.patch, HDFS-1140.patch
>
>
> When the namenode is loading the image there is a significant amount of time
> being spent in the DFSUtil.string2Bytes. We have a very specific workload
> here. The path that namenode does getPathComponents for shares N - 1
> component with the previous path this method was called for (assuming current
> path has N components).
> Hence we can improve the image load time by caching the result of previous
> conversion.
> We thought of using some simple LRU cache for components, but the reality is,
> String.getBytes gets optimized during runtime and LRU cache doesn't perform
> as well, however using just the latest path components and their translation
> to bytes in two arrays gives quite a performance boost.
> I could get another 20% off of the time to load the image on our cluster (30
> seconds vs 24) and I wrote a simple benchmark that tests performance with and
> without caching.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.