ion-elgreco commented on issue #5882:
URL: https://github.com/apache/arrow-rs/issues/5882#issuecomment-2288380454

   Hmm I am not sure, I started working on delta-rs a year ago and most of this 
FileSystem handling code was already there.
   
   We essentially create a DeltaFileSystemHandler which we expose to Python. In 
python we create a DeltaStorageHandler which inherits the pyarrow 
FileSystemHandler methods, which we implement to call the Rust 
DeltaFileSystemHandler.
   
   I think Pyarrow just calls read on an ObjectInputFile, which in rust calls 
`get_range` on the underlying object-store 
(https://github.com/delta-io/delta-rs/blob/c446b1287dedba122b941d8d1d4ae6290aa86d5c/python/src/filesystem.rs#L467-L495)
   
   ```
       fn read<'py>(&mut self, nbytes: Option<i64>, py: Python<'py>) -> 
PyResult<Bound<'py, PyBytes>> {
           self.check_closed()?;
           let range = match nbytes {
               Some(len) => {
                   let end = i64::min(self.pos + len, self.content_length) as 
usize;
                   std::ops::Range {
                       start: self.pos as usize,
                       end,
                   }
               }
               _ => std::ops::Range {
                   start: self.pos as usize,
                   end: self.content_length as usize,
               },
           };
           let nbytes = (range.end - range.start) as i64;
           self.pos += nbytes;
           let data = if nbytes > 0 {
               py.allow_threads(|| {
                   rt().block_on(self.store.get_range(&self.path, range))
                       .map_err(PythonError::from)
               })?
           } else {
               "".into()
           };
           // TODO: PyBytes copies the buffer. If we move away from the limited 
CPython
           // API (the stable C API), we could implement the buffer protocol for
           // bytes::Bytes and return this zero-copy.
           Ok(PyBytes::new_bound(py, data.as_ref()))
   ```
   
   Here `rt()` just creates a runtime if it doesn't exist yet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to