2010YOUY01 commented on PR #16996:
URL: https://github.com/apache/datafusion/pull/16996#issuecomment-3173879649

   > Btw Would be that possible to calculate the cost of the join like in 
https://www.youtube.com/watch?v=RcEW0P8iVTc ?
   > 
   > The video shows multiple implementations for NLJ and how to calculate the 
cost and describe pseudo code, it would be super useful for community and 
further improvements.
   > 
   > From what I understood, the left side scanned once, and entirely saved in 
memory, what about right scans? Perhaps in future we can play with blocks of 
input left batches to prevent OOM
   
   Yes, that's exactly the idea for the future memory-limited NLJ 
implementation -- for each buffered left batches (under memory limit), do 1 
round of right scan.
   Though I don't think there are much tuning opportunities here, I think input 
scanning would be expensive if it's a parquet file, so the goal here is to 
minimize the number of right scans, and we should buffer as much left batches 
as possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to