stayrascal opened a new issue, #7026:
URL: https://github.com/apache/arrow-rs/issues/7026

   **Describe the bug**
   Hi, I noticed that the documentation of 
[Path](https://github.com/apache/arrow-rs/blob/main/object_store/src/path/mod.rs#L96)
 mentioned that `Paths do not contain leading or trailing /`, and the parsing 
logic will strip the prefix and suffix delimiter '/', which will lead some 
problems that the object key has a trailing '/', for example:
   
   1. We cannot head or put an object with a trailing '/', e.g. the upstream of 
object_store might want to create a 'dir', or head an object with a trailing 
'/' created by other application. Since the Path parsing logic will strip the 
trailing '/', the object key in the request is not what we expected.
   
   ```
   // Assume there is an object "a/b/c/" existing in object storage, the bellow 
logic cannot head the object meta.
   let path = Path::parse("a/b/c/") // the actual raw string is "a/b/c"
   storage.head(&x3).await.unwrap()
   ```
   
   2. If we want to delete all objects(include the object has a trailing '/') 
under a given prefix, e.g.  the 
[delete_fixtures](https://github.com/apache/arrow-rs/blob/main/object_store/src/integration.rs#L1106)
 function doesn't work. 
   Even thought the response content of object storage(e.g. the ListResponse of 
S3), but the trailing '/' will be remove during converting `ListResponse` to 
`ListResult` via `Path::parse(xxx)` or `TryFrom::try_from`, which lead to send 
a delete request contains unexpected object key later.
   ```
   pub async fn delete_fixtures(storage: &DynObjectStore) {
       let paths = storage.list(None).map_ok(|meta| { meta.location } ).boxed();
       storage
           .delete_stream(paths)
           .try_collect::<Vec<_>>()
           .await
           .unwrap();
   }
   
   impl TryFrom<ListResponse> for ListResult {
       type Error = crate::Error;
   
       fn try_from(value: ListResponse) -> Result<Self> {
           let common_prefixes = value
               .common_prefixes
               .into_iter()
               .map(|x| Ok(Path::parse(x.prefix)?)) // will remove the trailing 
'/' if exists
               .collect::<Result<_>>()?;
   
           let objects = value
               .contents
               .into_iter()
               .map(TryFrom::try_from) // will remove the trailing '/' if exists
               .collect::<Result<_>>()?;
   
           Ok(Self {
               common_prefixes,
               objects,
           })
       }
   }
   ```
   
   
   
   **To Reproduce**
   <!--
   Steps to reproduce the behavior:
   -->
   
   **Expected behavior**
   I'm not sure the original reason that require strip the suffix delimiter 
during paring Path, and I maybe missing something.
   
   I'm thinking it's reasonable that support suffix delimiter '/' during paring 
Path to support the above all mentioned case, because we cannot assume that all 
objects are created by `object_store`, if there are some objects with trailing 
'/' no matter it's a 'file' or 'dir', there might some unexpected behaviors in 
`object_store`.
   
   Please correct me if I misunderstand something, and I'm happy to fix that if 
we are agree with support a trailing '/'.
   
   **Additional context**
   <!--
   Add any other context about the problem here.
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to