2010YOUY01 commented on issue #16065:
URL: https://github.com/apache/datafusion/issues/16065#issuecomment-2888233712

   Welcome aboard! We're excited to collaborate with you for this GSoC project 
😄 
   
   Regarding the plan, I can see the following sub-tasks:
   
   1. Stabilize external sort and aggregate.
   2. Implement a memory-limited nested loop join (NLJ). This serves as a safe 
fallback in case external sort-merge join (SMJ) or future external hash join 
(HJ) implementations fail in certain scenarios. It can also be used for 
differential testing against other join executor implementations.
   3. Optimize the spill format, likely building on top of Arrow's IPC stream 
reader/writer.
   (And also improve UX/performance along the way)
   
   I plan to open separate issues for each sub-task to better describe the 
problems and outline the approaches.
   
   Are there any other tasks worth exploring? I'm not very familiar with Arrow 
IPC internal, are there any stream reader/writer–related tasks we could also 
consider? @alamb 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to