tustvold commented on PR #2677: URL: https://github.com/apache/arrow-datafusion/pull/2677#issuecomment-1170942273
> I am not clear about the whereas master interleaves the IO and decoding i think master use block IO, decode must wait for IO. this patch uses interleaving with async function to reduce the blocked IO. Master interleaves IO at the page level, reading individual pages as required blocking the calling thread as it does so. This branch instead performs async IO fetching column chunks into memory without blocking threads, this is significantly better for object stores, but will perform "worse" for certain workloads accessing local files where the approach on master may be faster, but with the obvious drawback of blocking threads. > if we first integrated the object store abstraction into the repository. I would be fine waiting until the donation to arrow-rs goes through (https://github.com/influxdata/object_store_rs/issues/41) but I had hoped that given this intent had been clearly broadcast, rather than waiting the 3 or so weeks it will take to go through this process, we could just get this in. What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
