Hi everyone! If you are reading this, it means that you felt in the trap of my catchy (but meaningless) title!
This discussion somewhat relates to [1]. DataFusion has recently made its top level "actions" (collect, write...) async. The problem is that most of the codebase is not async (in particular Parquet [2]), which means that you have to make an async context work together with a sync one. This works okay... until it doesn't! I am trying to read into DataFusion from S3, using the AWS Rust SDK Rusoto. The problem is that this SDK is itself async. This means that you end up with the following layers: DataFusion (async) -> Parquet (sync) -> Rusoto (async) As you might now, Tokio does not support blocking on a runtime from within a runtime. This triggers a set of questions: - Does anybody know a way to make such a setup work? - Making Parquet async is extremely difficult and breaking, should we try to do it [2] ? - Is the benefit of having DataFusion async really big? Should we maybe have both a sync and an async API ? Thanks everybody and have a wonderful day. Regards, Remi [1] https://issues.apache.org/jira/browse/ARROW-9464 [2] https://issues.apache.org/jira/browse/ARROW-10307