xudong963 commented on issue #15191:
URL: https://github.com/apache/datafusion/issues/15191#issuecomment-2756831956

   > The only reason it is not needed here is because there are fewer files 
than `target_partitions`, so this will not work if we increase the number of 
files or reduce `target_partitions`. If we set `target_partitions` to 1 then it 
requires a sort:
   
   I reread the codebase, and also think so.
   
   
[FileScanConfig::split_groups_by_statistics](https://github.com/apache/datafusion/blob/main/datafusion/datasource/src/file_scan_config.rs#L569)
 definitely can solve the problem, then we can remove unnecessary `SortExec` 
which will be significant gains!
   
   One question: Is there something that makes it difficult to turn on 
`split_groups_by_statistics`  by default?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to