tustvold opened a new issue, #8009:
URL: https://github.com/apache/arrow-datafusion/issues/8009
### Describe the bug
The behaviour of ListingTableUrl with respect to paths containing percent
characters is rather confusing, and I suspect not entirely intentional.
Consider a filesystem containing a file named `bar%2Ffoo`, there is actually
no obvious way to address this file.
```
let url = ListingTableUrl::parse("file:///foo/bar%2Ffoo").unwrap();
assert_eq!(url.prefix.as_ref(), "foo/bar/foo");
let url = ListingTableUrl::parse("file:///foo/a%252Fb.txt").unwrap();
assert_eq!(url.prefix.as_ref(), "foo/a%252Fb.txt");
let dir = tempdir().unwrap();
let path = dir.path().join("bar%2Ffoo");
std::fs::File::create(&path).unwrap();
let url = ListingTableUrl::parse(path.to_str().unwrap()).unwrap();
assert!(url.prefix.as_ref().ends_with("bar%252Ffoo"), "{}", url.prefix);
```
### To Reproduce
_No response_
### Expected behavior
The "correct" behaviour is that a file URL should be URL-encoded. That is
according to the URL specification the correct way to reference this path would
be `file:///foo/a%252Fb.txt`. Similarly the non-URL version should be
`foo/a%2Fb.txt`.
That being said various tools instead interpret the URL path verbatim:
```
$ touch 'a%2Fb.txt'
$ aws --endpoint-url=http://localhost:4566 s3 cp 'a%2Fb.txt' s3://tustvold/
$ aws --endpoint-url=http://localhost:4566 s3 ls s3://tustvold/
2023-10-31 15:40:13 0 a%2Fb.txt
$ aws --endpoint-url=http://localhost:4566 s3 cp 's3://tustvold/a%2Fb.txt'
foo.txt
aws --endpoint-url=http://localhost:4566 s3 cp 's3://tustvold/a%2Fb.txt'
foo.txt
$ gsutil cp a\%2Fb.txt gs://tustvold
$ gsutil cp gs://tustvold/a\%2Fb.txt test
```
I'm not entirely sure how to classify DataFusion's current behaviour other
than confusing. I think we should probably strive to replicate tools like the
aws-cli and gsutil.
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]