jonmmease commented on issue #5034: URL: https://github.com/apache/arrow-datafusion/issues/5034#issuecomment-1402009879
Looks like there's something going wrong with column naming. I added some print statements to this function to log the `left`, `right`, and `on` arguments. https://github.com/apache/arrow-datafusion/blob/ab00bc11835f98dd06fa1262d23db2ce1e53a154/datafusion/core/src/physical_plan/joins/utils.rs#L75-L93 Without the `DISTINCT` qualifier, it looks like this: ``` left: {Column { name: "colA", index: 0 }, Column { name: "colB", index: 1 }} right: {Column { name: "colB", index: 0 }, Column { name: "colC", index: 1 }} on: [(Column { name: "colB", index: 1 }, Column { name: "colB", index: 0 })] ``` With the `DISTINCT` qualifier, it looks like this: ``` left: {Column { name: "colA", index: 0 }, Column { name: "colB", index: 1 }} right: {Column { name: "tbl.colB", index: 0 }, Column { name: "colC", index: 1 }} on: [(Column { name: "colB", index: 1 }, Column { name: "colB", index: 0 })] ``` So I think the direct cause of this error is that the `colB` column gets named `tbl.colB` at some point during physical planning. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
