ion-elgreco commented on issue #5882: URL: https://github.com/apache/arrow-rs/issues/5882#issuecomment-2288380454
Hmm I am not sure, I started working on delta-rs a year ago and most of this FileSystem handling code was already there. We essentially create a DeltaFileSystemHandler which we expose to Python. In python we create a DeltaStorageHandler which inherits the pyarrow FileSystemHandler methods, which we implement to call the Rust DeltaFileSystemHandler. I think Pyarrow just calls read on an ObjectInputFile, which in rust calls `get_range` on the underlying object-store (https://github.com/delta-io/delta-rs/blob/c446b1287dedba122b941d8d1d4ae6290aa86d5c/python/src/filesystem.rs#L467-L495) ``` fn read<'py>(&mut self, nbytes: Option<i64>, py: Python<'py>) -> PyResult<Bound<'py, PyBytes>> { self.check_closed()?; let range = match nbytes { Some(len) => { let end = i64::min(self.pos + len, self.content_length) as usize; std::ops::Range { start: self.pos as usize, end, } } _ => std::ops::Range { start: self.pos as usize, end: self.content_length as usize, }, }; let nbytes = (range.end - range.start) as i64; self.pos += nbytes; let data = if nbytes > 0 { py.allow_threads(|| { rt().block_on(self.store.get_range(&self.path, range)) .map_err(PythonError::from) })? } else { "".into() }; // TODO: PyBytes copies the buffer. If we move away from the limited CPython // API (the stable C API), we could implement the buffer protocol for // bytes::Bytes and return this zero-copy. Ok(PyBytes::new_bound(py, data.as_ref())) ``` Here `rt()` just creates a runtime if it doesn't exist yet -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org