[GitHub] [arrow-rs] JackKelly commented on issue #4631: object_store: Using `io_uring`?

via GitHub Thu, 03 Aug 2023 04:11:55 -0700


JackKelly commented on issue #4631:
URL: https://github.com/apache/arrow-rs/issues/4631#issuecomment-1663792866


   OK, cool, that's good to know. Thank you for your quick reply. No worries at 
all if `object_store` isn't the right place for this functionality.
   
   Just to make sure... please let me give a little more detail about what I'd 
ultimately like to do...
   
   First, some context: [Zarr](https://zarr.dev/) has been around for a while. 
As you probably know, the main idea behind Zarr is very simple: We take a large 
multi-dimensional array and save it to disk as multi-dimensional, compressed 
chunks. The user can request an arbitrary slice of the overall array, and Zarr 
will load the appropriate chunks, decompress them, and merge them into a single 
`ndarray`. `Zarr-Python`, the main implementation of Zarr, is currently 
single-threaded.
   
   We're now exploring ways to use multiple CPU cores in parallel to load, 
decompress, and copy each decompressed Zarr chunk into a "final" array, as fast 
as possible. (Many Zarr users would benefit if Zarr could max-out the hardware).
   
   If we were to implement our own IO backend using `io_uring`, we might first 
submit our queue of, say, 1 million read operations to the kernel. Then we'd 
have a thread pool (or perhaps we'd use an async executor) with roughly as many 
threads as there are logical CPU cores. Each worker thread would run a loop 
which starts by grabbing data from the `io_uring` completion queue, then 
immediately decompresses the chunk, and then - while the decompressed data is 
still in the CPU cache - write the decompressed chunk into the final array in 
RAM. So we'd need the load, decompression, and copy steps to happen in very 
quick succession; and ideally within a single thread per chunk (to make the 
code as "cache-friendly" as possible).
   
   Would you say that `object_store` isn't the right place to implement this 
batched, parallel "load-decompress-copy" functionality? Even if `object_store` 
implemented an `io_uring` backend, my guess is that it wouldn't be appropriate 
to modify `object_store` to allow for processing to be done on chunk _n-1_ 
whilst chunk _n_ is still being loaded. (If that makes sense?!) Instead, we'd 
first call `object_stores`'s `get_ranges` function. Then we'd `await` the 
`Future` returned by `get_ranges`, which will only return data when _all_ the 
chunks have been loaded. So we couldn't simultaneously decompress chunk _n-1_ 
whilst loading chunk _n_. Is that right? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] JackKelly commented on issue #4631: object_store: Using `io_uring`?

Reply via email to