Yiqun Lin commented on HDFS-10925:

I looked into this again, I have some other thoughts on this.

I think the {{HdfsFileStatus#getSymlink}} is the most frequently called since 
it will be used when resolving the path for the symlink path. The 
{{FSLinkResolver#resolve}} will call {{qualifySymlinkTarget}} method.
  public T resolve(final FileContext fc, final Path path) throws IOException {
    // Loop until all symlinks are resolved or the limit is reached
    for (boolean isLink = true; isLink;) {
      try {
        in = next(fs, p);
        isLink = false;
      } catch (UnresolvedLinkException e) {
        // Resolve the first unresolved path component
        p = qualifySymlinkTarget(fs.getUri(), p, fs.getLinkTarget(p));
        fs = fc.getFSofPath(p);
    return in;
And the method {{fs.getLinkTarget(p)}} will trigger the {{getLinkTarget}} in 
FileSystem. For example in {{DistributedFileSystem}}, every time we do the 
{{getLinkTarget}}, it will construct a new {{FileStatus}} and call the 
{{getSymlink}}. The related codes:
  public Path getLinkTarget(final Path f) throws IOException {
    final Path absF = fixRelativePart(f);
    return new FileSystemLinkResolver<Path>() {
      public Path doCall(final Path p) throws IOException {
        HdfsFileStatus fi = dfs.getFileLinkInfo(getPathName(p));
        if (fi != null) {
          // In method makeQualified, it will do bytes to string operation
          return fi.makeQualified(getUri(), p).getSymlink();
So I think here the string is the  dominate use case, and I see the symlink 
bytes just use in some convert method. Attach a new patch to make this change 
as well. Hi [~daryn], can you share your thought on this? Thanks!

> Cache symlinkString in INodeSymlink
> -----------------------------------
>                 Key: HDFS-10925
>                 URL: https://issues.apache.org/jira/browse/HDFS-10925
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs
>            Reporter: Yiqun Lin
>            Assignee: Yiqun Lin
>            Priority: Minor
>         Attachments: HDFS-10925.001.patch, HDFS-10925.002.patch, 
> HDFS-10925.003.patch
> In {{INodeSymlink}}'s construct method, it will transfer the input symlink 
> string to a byte array. If we want to invoke 
> {{INodeSymlink#getSymlinkString}}, it will transfer the byte array to the 
> string again. Since we don't cache symlinkString  here, it will do the 
> {{DFSUtil.bytes2String}} method every time. It seems not efficient.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to