yjshen edited a comment on pull request #811: URL: https://github.com/apache/arrow-datafusion/pull/811#issuecomment-901679869
@alamb @andygrove @Dandandan @jorgecarleitao @rdettai On making the remote storage system object listing & data reading API async, a design choice occurs. This might be quite important, and I'd love to have your suggestions: ### To which level should I propagate async? This was because once we have async dir listing -> we can have async logical plans & async table provider -> we can have async DataFrame / context API Two available alternatives are: 1. Limit async to just `listing` / `metadata_fetch` / file `read`, wrap a sync version over these async and keep most of the user-facing API untouched. (keep the PR lean as possible) 2. Propogate Async API all the way up and finally change the user-facing API: including DataFrame & ExecutionContext. (which includes huge user-facing API changes ). Currently, This PR took the first approach by constructing all APIs in `ObjectStore` / `ObjectReader` / `SourceRootDescriptor` natively in async and wrap the async function to a sync one. Trying to keep other parts of the project untouched. Great thanks to @houqp for guiding me through the way. Does approach 1 make sense to you? ### If I take approach 1, how should the sync version function be constructed? This PR tries to make a wrapper over the async counterparts and keep single logic for each functionality. therefore relies on `futures::executor::block_on` to bridge async to sync function. However, this approach is flawed for `block_on` may block the only thread in tokio, and the future inside won't get a chance to run, therefore hanging forever if the tokio runtime is not a multi-threaded one. (I temporarily change the related test to use `#[tokio::test(flavor = "multi_thread", worker_threads = 2)]` to avoid hanging). Do you have any suggestions on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
