[GitHub] [arrow-datafusion] mustafasrepo opened a new issue, #6118: Project output ordering at the source

via GitHub Tue, 25 Apr 2023 08:09:29 -0700


mustafasrepo opened a new issue, #6118:
URL: https://github.com/apache/arrow-datafusion/issues/6118


   ### Describe the bug
   
   Assume that input source is already ordered by `Column a`. Also assume that 
it consists of Columns a,b (with this order in the schema). When I run the 
query below.
   ```sql
   SELECT a FROM annotated_data
     ORDER BY a
   ``` 
   It produces following plan
   ``` 
   "CsvExec: files={1 group: [[FILE_PATH]]}, has_header=true, limit=None, 
projection=[a]",
   ``` 
   However, If input source schema were consist of Columns b, a (with this 
order in the schema). The query above produces following plan
   ``` 
   "SortExec: expr=[a@0 ASC NULLS LAST]",
   "  CsvExec: files={1 group: [[FILE_PATH]]}, has_header=true, limit=None, 
projection=[a]",
   ``` 
   
   ### To Reproduce
   
   _No response_
   
   ### Expected behavior
   
   I expect for second case to not produce `SortExec` in its physical plan.
   
   ### Additional context
   
   I think during `output_ordering` calculation for sources, we do not consider 
projection information. Hence `output_ordering` generated may not always be 
valid. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] mustafasrepo opened a new issue, #6118: Project output ordering at the source

Reply via email to