james-rms commented on issue #384:
URL: 
https://github.com/apache/arrow-rs-object-store/issues/384#issuecomment-2915118345

   > They are, Path::from("\x01.txt).as_ref() == "%01.txt" .
   
   Right, i should have said "why should control characters not be handled 
there (and not in Path::parse)"
   
   >  Disallowing empty or relative path segments along with trailing (or 
leading) / is fairly integral to how path resolution functions w.r.t things 
like list_with_delimiter, etc... and therefore needs to be done at this level.
   
   This is reasonable, but does not seem to justify disallowing control 
characters.
   
   > Excluding control characters is sufficient to ensure portability, with the 
sole exception of Windows filesystems
   
   I don't think it's sufficient to ensure portability. Off the top of my head, 
on MacOS this script prints `b"lowercase"`:
   ```rust
   use object_store::{ObjectStore, PutPayload};
   
   #[tokio::main]
   async fn main() {
       let prefix = std::path::PathBuf::from(std::env::current_dir().unwrap());
       let store = 
object_store::local::LocalFileSystem::new_with_prefix(&prefix).unwrap();
       let uppercase = object_store::path::Path::parse("A").unwrap();
       let lowercase = object_store::path::Path::parse("a").unwrap();
       store
           .put(&uppercase, PutPayload::from_static(b"uppercase"))
           .await
           .unwrap();
       store
           .put(&lowercase, PutPayload::from_static(b"lowercase"))
           .await
           .unwrap();
   
       let content = store.get(&uppercase).await.unwrap();
       println!("{:?}", content.bytes().await.unwrap())
   }
   ```
   
   Which I know feels like a gotcha, but it illustrates a point. My stance (and 
I think the stance of this crate) is that you can't rely on the Path type to 
provide portability between object stores, they all have their own rules around 
special characters.
   
   Side note: I also don't know if excluding control characters is _neccessary_ 
to ensure portability within unix filesystems - they will break shell tooling, 
but the posix API should continue to work on those files (I'm pretty sure). 
   
   > IMO sticking control characters in what are meant to be human readable 
paths is not something that makes a great deal of sense to support. Although I 
could be persuaded if someone had a real use-case.
   
   Fair enough. My use-case feels real to me: I have objects in GCS buckets 
that were created through a sensible process (stress testing with another 
language's SDK), I want to manage those objects with Rust code. As always, it's 
just my opinion and it's up to you what your code does.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to