Hi Sasha,

For case 2, I don't see why we need a back-pressure mechanism. Lets say
there is an IO thread. All we need is a queue with a defined capacity that
feeds data from IO thread to the Read task.

Supun..

On Fri, May 20, 2022 at 8:25 PM Sasha Krassovsky <krassovskysa...@gmail.com>
wrote:

> Hi Supun,
> Roughly what happens now is #2. However, in your example, it may be the
> case that we are reading CSV data from disk faster than we are transcoding
> it into Parquet and writing it. Note that we attempt to use the full disk
> bandwidth and assign batches to cores once the reads are done, so ideally a
> core is never blocked on IO. In other words, even if we have 2 only cores,
> we may kick off 100 batch reads and process them when the read completes
> and a core is available. This is where backpressure is needed: to prevent
> us from having this huge number of reads piling up and filling up memory
> faster than we can process.
>
> Sasha Krassovsky
>
> > 20 мая 2022 г., в 20:09, Supun Kamburugamuve <su...@apache.org>
> написал(а):
> >
>


-- 
Supun Kamburugamuve

Reply via email to