Hi Sasha, For case 2, I don't see why we need a back-pressure mechanism. Lets say there is an IO thread. All we need is a queue with a defined capacity that feeds data from IO thread to the Read task.
Supun.. On Fri, May 20, 2022 at 8:25 PM Sasha Krassovsky <krassovskysa...@gmail.com> wrote: > Hi Supun, > Roughly what happens now is #2. However, in your example, it may be the > case that we are reading CSV data from disk faster than we are transcoding > it into Parquet and writing it. Note that we attempt to use the full disk > bandwidth and assign batches to cores once the reads are done, so ideally a > core is never blocked on IO. In other words, even if we have 2 only cores, > we may kick off 100 batch reads and process them when the read completes > and a core is available. This is where backpressure is needed: to prevent > us from having this huge number of reads piling up and filling up memory > faster than we can process. > > Sasha Krassovsky > > > 20 мая 2022 г., в 20:09, Supun Kamburugamuve <su...@apache.org> > написал(а): > > > -- Supun Kamburugamuve