[ 
https://issues.apache.org/jira/browse/HDFS-10925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15569139#comment-15569139
 ] 

Yiqun Lin commented on HDFS-10925:
----------------------------------

I looked into this again, I have some other thoughts on this.

I think the {{HdfsFileStatus#getSymlink}} is the most frequently called since 
it will be used when resolving the path for the symlink path. The 
{{FSLinkResolver#resolve}} will call {{qualifySymlinkTarget}} method.
{code}
  public T resolve(final FileContext fc, final Path path) throws IOException {
    ...
    // Loop until all symlinks are resolved or the limit is reached
    for (boolean isLink = true; isLink;) {
      try {
        in = next(fs, p);
        isLink = false;
      } catch (UnresolvedLinkException e) {
        ...
        // Resolve the first unresolved path component
        p = qualifySymlinkTarget(fs.getUri(), p, fs.getLinkTarget(p));
        fs = fc.getFSofPath(p);
      }
    }
    return in;
  }
{code}
And the method {{fs.getLinkTarget(p)}} will trigger the {{getLinkTarget}} in 
FileSystem. For example in {{DistributedFileSystem}}, every time we do the 
{{getLinkTarget}}, it will construct a new {{FileStatus}} and call the 
{{getSymlink}}. The related codes:
{code}
  public Path getLinkTarget(final Path f) throws IOException {
    statistics.incrementReadOps(1);
    storageStatistics.incrementOpCounter(OpType.GET_LINK_TARGET);
    final Path absF = fixRelativePart(f);
    return new FileSystemLinkResolver<Path>() {
      @Override
      public Path doCall(final Path p) throws IOException {
        HdfsFileStatus fi = dfs.getFileLinkInfo(getPathName(p));
        if (fi != null) {
          // In method makeQualified, it will do bytes to string operation
          return fi.makeQualified(getUri(), p).getSymlink();
          ...
{code}
So I think here the string is the  dominate use case, and I see the symlink 
bytes just use in some convert method. Attach a new patch to make this change 
as well. Hi [~daryn], can you share your thought on this? Thanks!

> Cache symlinkString in INodeSymlink
> -----------------------------------
>
>                 Key: HDFS-10925
>                 URL: https://issues.apache.org/jira/browse/HDFS-10925
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs
>            Reporter: Yiqun Lin
>            Assignee: Yiqun Lin
>            Priority: Minor
>         Attachments: HDFS-10925.001.patch, HDFS-10925.002.patch, 
> HDFS-10925.003.patch
>
>
> In {{INodeSymlink}}'s construct method, it will transfer the input symlink 
> string to a byte array. If we want to invoke 
> {{INodeSymlink#getSymlinkString}}, it will transfer the byte array to the 
> string again. Since we don't cache symlinkString  here, it will do the 
> {{DFSUtil.bytes2String}} method every time. It seems not efficient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to