[
https://issues.apache.org/jira/browse/HDFS-14379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Íñigo Goiri reassigned HDFS-14379:
----------------------------------
Assignee: Boris Vulikh
> WebHdfsFileSystem.toUrl double encodes characters
> -------------------------------------------------
>
> Key: HDFS-14379
> URL: https://issues.apache.org/jira/browse/HDFS-14379
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs, hdfs-client
> Affects Versions: 3.2.0
> Reporter: Boris Vulikh
> Assignee: Boris Vulikh
> Priority: Major
> Attachments: HDFS-14379-patch1.patch
>
>
> When using DistCP over HTTPFS with data that contains Spark partitions,
> DistCP fails to access the partitioned parquet files since the "=" characters
> in file path gets double encoded:
> {{"/test/spark/partition/year=2019/month=1/day=1"}}
> to
> {{"/test/spark/partition/year%253D2019/month%253D1/day%253D1"}}
> This happens since {{fsPathItem}} containing the character
> {color:#d04437}'='{color} is encoded by {{URLEncoder._encode_(fsPathItem,
> "UTF-8")}} to {color:#d04437}'%3D'{color} and then encoded again by {{new
> Path(....)}} to {color:#d04437}'%253D'{color}.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]