Re: [DataFusion] Question about async/await?

Evan Chan Mon, 13 Sep 2021 10:10:39 -0700

The other suggestion would be to have a way to monitor and watch for when the 
CPU-bound thread pool saturates, which can result in queues backing up into the 
main dispatch async threads as well…. Ie there might be some spillover if the 
CPU thread pool fills up to watch out for.


-Evan

> On Sep 13, 2021, at 3:35 AM, Andrew Lamb <al...@influxdata.com> wrote:
> 
> I have found the suggestion in the tokio docs to avoid using tokio for CPU
> bound work very confusing. I think the core suggestion is not to use the
> same threadpool for IO and CPU bound work (which makes a lot of sense) but
> it is perfectly feasible to create multiple tokio threadpools (`Runtimes`)
> in the same process.
> 
> I have filed a PR[1] with tokio to try and clarify the docs in this matter.
> 
> Andrew
> 
> p.s. if you want an example of how to create multiple tokio threadpools,
> you there is a link to `DedicatedExecutor` on the ticket description.
> 
> https://github.com/tokio-rs/tokio/pull/4105
> 
> On Mon, Sep 13, 2021 at 12:15 AM QP Hou <houqp....@gmail.com> wrote:
> 
>> Hi Renjie,
>> 
>> If by datafusion benchmarks, you are referring to the code in the
>> datafusion/benches folder, then those benchmarks are executed with
>> tokio runtime.
>> 
>> You are correct that one should schedule compute bound tasks into a
>> separate task managed by a dedicated thread to avoid blocking the
>> async runtime main thread. This practice applies to not just tokio,
>> but any other async runtime in general.
>> 
>> The tokio runtime used in the benchmark is initiated with
>> `tokio::runtime::Runtime::new()`. Tokeio in datafusion/Cargo.toml is
>> pulled in with the `rt-multi-thread` feature flag. So I believe by
>> default it creates the runtime with a multi-thread scheduler. I don't
>> think it matters that much for benchmarks though, because in those
>> benchmark code, we call `Runtime::block_on` when executing the async
>> query code.
>> 
>> On Sat, Sep 11, 2021 at 7:38 PM Renjie Liu <liurenjie2...@gmail.com>
>> wrote:
>>> 
>>> Hi, all:
>>> I see that the executor trait is marked as async/await in method
>>> definition. I have several questions:
>>> 1. What async/await runtime is used in benchmarking?
>>> 2. Tokio is the most popular async/await runtime, and they suggest to put
>>> long running tasks in separate thread pool rather than using tokio
>> runtime
>>> directly, and you can find this here <
>> https://docs.rs/tokio/1.11.0/tokio/>
>>> 
>>>> If your code is CPU-bound and you wish to limit the number of threads
>> used
>>>> to run it, you should run it on another thread pool such as rayon
>>>> <https://docs.rs/rayon>.
>>>> 
>>> So my second question is did you test against thread pool execution mode?
>>> 
>>> It would be highly appreciated if you can answer my question.
>>> --
>>> Renjie Liu
>>> Software Engineer, MVAD
>>

Re: [DataFusion] Question about async/await?

Reply via email to