2010YOUY01 commented on PR #18817:
URL: https://github.com/apache/datafusion/pull/18817#issuecomment-3569439710

   > Thank you @2010YOUY01 for review and valid concern:
   You raise valid concerns about memory overhead is what i mentioned the key 
risk for this approach.
   However, I want to clarify that row group reversal alone cannot eliminate 
the SortExec - it only provides TopK filtering benefits. Without reversing rows 
within each row group, the data remains in the original order (e.g., ASC when 
we need DESC), so the sort must stay. I propose we keep the complete 
optimization but default enable_reverse_scan to false. Once we implement 
page-level caching in arrow-rs (which will reduce memory overhead 
significantly), we can consider enabling it by default.
   
   Did you mean 'cannot eliminate the SortExec(TopK)'? Just to confirm there is 
no global sort, but it is true that we have do a `topK` on a whole row group 
for this naive approach.
   
   I have a intuition that for this kind of workload, the bottleneck is on the 
parquet decoding speed, and an extra `TopK` won't introduce much additional 
overhead, so this naive approach can also get pretty fast.
   
   It makes a lot of sense that it's very hard to implement page/row level 
reversal in `arrow-rs` side, so we have to figure out how to do this at 
row-group level.
   
   Summary: Perhaps we can start by adding a few end-to-end benchmarks that 
reflect your typical production workload. If this PR’s approach shows a clear 
improvement over the naive approach in 
https://github.com/apache/datafusion/pull/18817#issuecomment-3568934764 (I'm 
happy to do a quick prototype), we should definitely move forward.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to