[ 
https://issues.apache.org/jira/browse/ARROW-13048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17362948#comment-17362948
 ] 

David Li commented on ARROW-13048:
----------------------------------

I can confirm this with both 4.0.1 and the latest master. Turning on debug 
logging, I see that the source key has been double-encoded: "x-amz-copy-source: 
foo/a*%253D*1/foo.parquet". I also see that Arrow is URL-encoding the source 
key as required by the SDK. But the SDK itself URL-encodes the source again! 
([source|https://github.com/aws/aws-sdk-cpp/blob/bd00fe8d76e2a774c6342f659b49d4458658f4c3/aws-cpp-sdk-s3-crt/source/model/CopyObjectRequest.cpp#L144-L149])

> [Python] S3FileSystem fails moving filepaths containing = or +
> --------------------------------------------------------------
>
>                 Key: ARROW-13048
>                 URL: https://issues.apache.org/jira/browse/ARROW-13048
>             Project: Apache Arrow
>          Issue Type: Bug
>    Affects Versions: 4.0.1
>            Reporter: Joerg Schneider
>            Priority: Major
>
> Hi Arrow team,
> we have the very common use-case of having partitioned parquet tables on S3, 
> written by Spark. These include equals (=) to denote the partition value per 
> folder.
>  
> In trying to use PyArrows S3FileSystem `move` function, it's not possible to 
> move these objects in the bucket underneath a path which contains `=` 
> somewhere: 
> {code:java}
> OSError: When copying key 
> 'table/date=202007/part-00000-e39069c2-0ea6-4a62-85ea-8011047cd4f4.c000.snappy.parquet'
>  in bucket 'bucket' to key 
> 'table2/date=202007/part-00000-e39069c2-0ea6-4a62-85ea-8011047cd4f4.c000.snappy.parquet'
>  in bucket 'bucket': AWS Error [code 133]: The specified key does not 
> exist.{code}
> It is also not possible to move, using preemptively URL-quoted paths, like 
> these:
>  
> {code:java}
> OSError: When copying key 
> 'table/date%3D202007/part-00000-e39069c2-0ea6-4a62-85ea-8011047cd4f4.c000.snappy.parquet'
>  in bucket 'bucket' to key 
> 'table2/date%3D202007/part-00000-e39069c2-0ea6-4a62-85ea-8011047cd4f4.c000.snappy.parquet'
>  in bucket 'bucket': AWS Error [code 133]: The specified key does not 
> exist.{code}
>  
> The source object does definitely exist, it has in fact been returned by a 
> FileSelector from PyArrow itself and is just passed to move.
> Is there any configuration option to be set, or special quoting to be used?
> Thanks in advance.
> Joerg
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to