bolkedebruin commented on code in PR #36410:
URL: https://github.com/apache/airflow/pull/36410#discussion_r1436951129
##########
airflow/io/path.py:
##########
@@ -173,10 +176,14 @@ def bucket(self) -> str:
@property
def key(self) -> str:
if self._url:
- return self._url.path
+ return self._url.path.lstrip(self.sep)
Review Comment:
disclaimer: i'm not sure what is right here either.
Given that documentation on object stores like
[[1]](https://docs.openstack.org/api-ref/object-store/#create-or-replace-object)
[[2]](https://www.educative.io/answers/what-do-keys-mean-in-amazon-s3)
[[3]](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html)
[[4]](https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blobs-list)
all use *relative* identifiers to objects/keys and they construct an URI to to
an object as follows:
```python
uri = f"{protocol}://{container}/{object}" # note the hardcoded '/'
# or in s3 speak
uri = f"{protocol}://{bucket}/{key}" # again the hardcoded '/'
```
Double slashes don't make sense here because if referencing a local file the
bucket would be "" (null/empty):
```python
b = ObjectStoragePath("file:///tmp/data.json")
assert b.bucket == ""
```
Reconstructing the uri would then still work:
```python
uri = f"{b.protocol}://{b.bucket}/{b.path}" # uri = "file:///tmp/data.json"
```
I think the proper way to reference a key/object is relative and thus
stripping a slash.
`path` is obtained from the urlsplit and we always assume a non-relative url
that is passed to ObjectStoragePath. So considering that I think stripping the
leading slashes from a `key` is the correct behavior.
Note: The S3 Hook also strips the leading slash for keys
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]