gabotechs commented on PR #19760:
URL: https://github.com/apache/datafusion/pull/19760#issuecomment-3833440299

   > This seems in many ways quite similar to what RepartitionExec w/ spilling 
does. Have you had a chance to poke at that code?
   
   Yes, in fact a small chunk of the code there still shows my name in the git 
blames. It do is similar in the sense that there is some per-partition 
buffering, but it looks like that code is in a more difficult situation, as it 
needs to be able to buffer potentially indefinitely due to the unbounded nature 
of RepartitionExec (correct me if I'm wrong, it's been a while since I looked 
at that code), whether the code in this PR can afford to have bounded channels.
   
   At first sight I do not see a lot of opportunities for reusing code in both 
places due to the different requirements, but happy to listen to ideas.
   
   > Maybe both are needed though? As in: you want buffering and prefetching.
   
   Another difference with RepartitionExec is that BufferExec will eagerly poll 
its children regardless of whether its stream was polled or not, and 
RepartitionExec will wait for the first poll to start doing work. This means 
that RepartitionExec does not prefetch, but BufferExec does
   
   > The advantage I see of buffering at the Parquet level is that the reader 
can do fancy things like planning to fetch a larger contiguous chunk of data 
from object storage
   
   👍 I can see this being beneficial. My intention was to first use this in 
https://github.com/apache/datafusion/pull/19761, but the BufferExec node is 
something you are supposed to be able to place wherever you want. In fact, we 
do use it in more scenarios at DataDog.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to