MohamedAbdeen21 commented on issue #9488:
URL: 
https://github.com/apache/arrow-datafusion/issues/9488#issuecomment-2015983979

   The query in the issue occasionally returns the correct answer; however, 
increasing the offset to a large enough number < total rows, almost never works 
as expected.
   
   First option works well, but it kills the parallelism. Second option isn't 
always viable especially when data is not originally sorted.
   
   Two possible solutions I can think of:
   
   - Attach the partition number to the record batch and consider it when 
executing the limit. This will kill the early termination of the input stream.
   
   - Attach the partition number as a new column and insert a `sort by` that 
column before the limit. This doesn't look easy given that DF doesn't have a 
`with_column` that can append columns as far as I can tell.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to