alamb commented on PR #9697:
URL: https://github.com/apache/arrow-rs/pull/9697#issuecomment-4281589075

   > This gives us fairly clean decoupling: the IO layer can do whatever it 
wants, but can't push unsolicited buffers, which seems like a reasonable 
constraint.
   > WDYT @alamb?
   
   I think pushing unsolicated buffers is important to support a prefetched 
usecase, though it is not clear if your proposal precludes that. 
   
   I also worry it will be complicated to track the exact ranges needed / not 
needed, and it adds a new non trivial constrant on the decoder to do range 
tracking. 
   
   I really liked the thoretical simplicity of your initial watermark and I 
feel like we should be able to leverage the fact that the biggest unit of 
buffering is a row group. As soon as the decoder is done with a row group, any 
data pre-fetched for it can be released. 
   
   Maybe we could add some way for the decoder to report at a higher 
granularity what might still be requested.
   
   For example, maybe we could add an API ilke 
`PushDecoder::remaining_row_groups` -- that returns row groups that the decoder 
may still read in the future. 
   
   and the I/O subsystem can handle the mapping of whatever I/O prefetching it 
has done to those row groups, and when a row group is no longer needed it can 
cancel I/O and/or flush or whatever 🤔  And with the ability to clear the buffer 
sin the push decoder you can control memory very fine grained


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to