[ 
https://issues.apache.org/jira/browse/HADOOP-16258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827615#comment-16827615
 ] 

Masatake Iwasaki commented on HADOOP-16258:
-------------------------------------------

After HDFS-13176, every path element is encoded by URLEncoder#encode in 
WebHdfsFileSystem#toUrl. Path is created based on the encoded string. 
Path#initialize calls multi-argument constructor of 
[java.net.URI|https://docs.oracle.com/javase/8/docs/api/java/net/URI.html] 
which encodes chars such as ' ' and '%'. This is the reason why "dt=1" is 
doubly encoded as "dt%253D1".

HDFS-13582 is the follow-up trying to apply URLEncoder to relevant path element 
only. I think the code does not work as intended. Since the 
{{pathAlreadyEncoded}} in the code below is always true, every path element is 
still encoded as before.
{noformat}
      try {
        fspathUriDecoded = URLDecoder.decode(fspathUri.getPath(), "UTF-8");
        pathAlreadyEncoded = true;
      } catch (IllegalArgumentException ex) {
        LOG.trace("Cannot decode URL encoded file", ex);
      }
      ...
          if (fsPathItem.matches(SPECIAL_FILENAME_CHARACTERS_REGEX) ||
              pathAlreadyEncoded) {
            fsPathEncodedItems.append(URLEncoder.encode(fsPathItem, "UTF-8"));
          } else {
            fsPathEncodedItems.append(fsPathItem);
          }
{noformat}

> FileSystem.listLocatedStatus for path including '=' encodes it and returns 
> FileNotFoundException
> ------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-16258
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16258
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 3.2.0
>            Reporter: Yuya Ebihara
>            Assignee: Masatake Iwasaki
>            Priority: Minor
>              Labels: webhdfs
>         Attachments: HADOOP-16258.001.patch
>
>
> Recently, we upgraded hadoop library from 2.7.7 to 3.2.0. This issue happens 
> after the update. When we call FileSystem.listLocatedStatus with location 
> 'webhdfs://hadoop-master:50070/user/hive/warehouse/test_part/dt=1', the 
> internal calls are
>  * 2.7.7 
> [http://hadoop-master:50070/webhdfs/v1/user/hive/warehouse/test_part/dt=1?op=LISTSTATUS&user.name=xxx|http://hadoop-master:50070/webhdfs/v1/user/hive/warehouse/test_part/dt=1?op=LISTSTATUS&user.name=xxx%27,]
>  * 3.2.0 
> [http://hadoop-master:50070/webhdfs/v1/user/hive/warehouse/test_part/dt%253D1?op=LISTSTATUS&user.name=xxx]'
> As a result, it returns RemoteException with FileNotFoundException.
> {code:java}
> {"RemoteException":{"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"File
>  /user/hive/warehouse/test_part/dt%3D1 does not exist."}}
> {code}
> Could you please tell me whether it's a bug and the way to avoid it?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to