jonmmease commented on issue #5034:
URL: 
https://github.com/apache/arrow-datafusion/issues/5034#issuecomment-1402009879

   Looks like there's something going wrong with column naming.  I added some 
print statements to this function to log the `left`, `right`, and `on` 
arguments.
   
   
https://github.com/apache/arrow-datafusion/blob/ab00bc11835f98dd06fa1262d23db2ce1e53a154/datafusion/core/src/physical_plan/joins/utils.rs#L75-L93
   
   Without the `DISTINCT` qualifier, it looks like this:
   
   ```
   left: {Column { name: "colA", index: 0 }, Column { name: "colB", index: 1 }}
   right: {Column { name: "colB", index: 0 }, Column { name: "colC", index: 1 }}
   on: [(Column { name: "colB", index: 1 }, Column { name: "colB", index: 0 })]
   ```
   
   With the `DISTINCT` qualifier, it looks like this:
   ```
   left: {Column { name: "colA", index: 0 }, Column { name: "colB", index: 1 }}
   right: {Column { name: "tbl.colB", index: 0 }, Column { name: "colC", index: 
1 }}
   on: [(Column { name: "colB", index: 1 }, Column { name: "colB", index: 0 })]
   ```
   
   So I think the direct cause of this error is that the `colB` column gets 
named `tbl.colB` at some point during physical planning.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to