[
https://issues.apache.org/jira/browse/ARROW-18290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633773#comment-17633773
]
Miles Granger edited comment on ARROW-18290 at 11/14/22 1:04 PM:
-----------------------------------------------------------------
Hmm, seems like one way for Arrow to fix this is to do something similar as
described in this issue [aws-sdk-cpp #1224 |
https://github.com/aws/aws-sdk-cpp/issues/1224], subclassing our own client
which does the {{URI::URLEncodePath}} as suggested:
{quote}
I've found one thing that DOES work. I'm not suggesting that this is the fix,
but it does seem to give the desired behavior, at least for this test case. In
URI::GetURIString(), m_path is escaped using URLEncodePathRFC3986() which
explicitly checks for and does not percent encode the "=". If this call is
replaced with a similar one that does escape the "=" (and only done here.
URLEncodePathRFC3986() is called in a few places), the resulting URL becomes
this:
{quote}
And [the comment about a new client class |
https://github.com/aws/aws-sdk-cpp/issues/1224#issuecomment-525061745] to
circumvent this issue.
What do you think, [~apitrou]?
was (Author: JIRAUSER293894):
Hmm, seems like one way for Arrow to fix this is to do something similar as
described in this issue [aws-sdk-cpp #1224 |
https://github.com/aws/aws-sdk-cpp/issues/1224], subclassing our own client
which does the {{URI::URLEncodePath}} as suggested:
{quote}
I've found one thing that DOES work. I'm not suggesting that this is the fix,
but it does seem to give the desired behavior, at least for this test case. In
URI::GetURIString(), m_path is escaped using URLEncodePathRFC3986() which
explicitly checks for and does not percent encode the "=". If this call is
replaced with a similar one that does escape the "=" (and only done here.
URLEncodePathRFC3986() is called in a few places), the resulting URL becomes
this:
{quote}
What do you think, [~apitrou]?
> [Python] `pyarrow.fs.copy_files` doesn't work if filenames contain special
> characters
> -------------------------------------------------------------------------------------
>
> Key: ARROW-18290
> URL: https://issues.apache.org/jira/browse/ARROW-18290
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 6.0.1
> Reporter: Balaji Veeramani
> Priority: Minor
>
> I can't upload a file called `spam=ham` to a filesystem that emulates an S3
> API. I can workaround the issue by renaming the file `spam-ham`.
> To reproduce, run a filesystem that emulates an S3 API:
> {code:java}
> docker run -p 9444:9000 scireum/s3-ninja:latest
> {code}
> Authenticate with the filesystem:
> {code:java}
> export AWS_ACCESS_KEY_ID="AKIAIOSFODNN7EXAMPLE"
> export AWS_SECRET_ACCESS_KEY="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
> {code}
> Then run this Python script:
> {code:python}
> import os
> import tempfile
> import pyarrow.fs
> source = tempfile.mkdtemp()
> file_path = os.path.join(source, "spam=ham")
> open(file_path, "w").close()
> filesystem, path = pyarrow.fs.FileSystem.from_uri(
> "s3://bucket?scheme=http&endpoint_override=localhost:9444"
> )
> pyarrow.fs.copy_files(source, path, destination_filesystem=filesystem)
> {code}
> You'll get the error
> {code:java}
> OSError: When initiating multiple part upload for key 'spam=ham' in bucket
> 'bucket': AWS Error [code 22]: The computed request signature does not match
> the one provided. Check login credentials. (Expected:
> e70ab9efb620f744abd43d13e8e6846c585a41da543bfb5da67d2fe1ccfd1aaa, Found:
> 648456e3441dad5a014b2981c71b6e69ccac9732bdcdbe2d363d95105d914340)
> {code}
> This issue is motivated by [https://github.com/ray-project/ray/issues/29845].
--
This message was sent by Atlassian Jira
(v8.20.10#820010)