JackKelly commented on issue #4631: URL: https://github.com/apache/arrow-rs/issues/4631#issuecomment-1663792866
OK, cool, that's good to know. Thank you for your quick reply. No worries at all if `object_store` isn't the right place for this functionality. Just to make sure... please let me give a little more detail about what I'd ultimately like to do... First, some context: [Zarr](https://zarr.dev/) has been around for a while. As you probably know, the main idea behind Zarr is very simple: We take a large multi-dimensional array and save it to disk as multi-dimensional, compressed chunks. The user can request an arbitrary slice of the overall array, and Zarr will load the appropriate chunks, decompress them, and merge them into a single `ndarray`. `Zarr-Python`, the main implementation of Zarr, is currently single-threaded. We're now exploring ways to use multiple CPU cores in parallel to load, decompress, and copy each decompressed Zarr chunk into a "final" array, as fast as possible. (Many Zarr users would benefit if Zarr could max-out the hardware). If we were to implement our own IO backend using `io_uring`, we might first submit our queue of, say, 1 million read operations to the kernel. Then we'd have a thread pool (or perhaps we'd use an async executor) with roughly as many threads as there are logical CPU cores. Each worker thread would run a loop which starts by grabbing data from the `io_uring` completion queue, then immediately decompresses the chunk, and then - while the decompressed data is still in the CPU cache - write the decompressed chunk into the final array in RAM. So we'd need the load, decompression, and copy steps to happen in very quick succession; and ideally within a single thread per chunk (to make the code as "cache-friendly" as possible). Would you say that `object_store` isn't the right place to implement this batched, parallel "load-decompress-copy" functionality? Even if `object_store` implemented an `io_uring` backend, my guess is that it wouldn't be appropriate to modify `object_store` to allow for processing to be done on chunk _n-1_ whilst chunk _n_ is still being loaded. (If that makes sense?!) Instead, we'd first call `object_stores`'s `get_ranges` function. Then we'd `await` the `Future` returned by `get_ranges`, which will only return data when _all_ the chunks have been loaded. So we couldn't simultaneously decompress chunk _n-1_ whilst loading chunk _n_. Is that right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
