xingyu-long commented on PR #46566: URL: https://github.com/apache/arrow/pull/46566#issuecomment-2921331804
> This is an independent problem. Because join is concatenating columns from both sides, so it is possible that the result table contains columns with the same name. If so, you won't be able to further reference a such column without ambiguity. You can specify output_suffix_for_left/right to append unique identifiers to their column names, so that you can disambiguate them. I see, so if I understand this correctly, ideally, we probably should assign distinct key for both columns before using filter expression since output_suffix_for_left would only works for output at the end of the workflow, right? (sorry if this is a dumb question...) i.e., something like this won't work ```python3 join_opts = HashJoinNodeOptions( "inner", left_keys="key", right_keys="key", output_suffix_for_left="_left",output_suffix_for_right="_right", filter=pc.equal(pc.field('key_left'), 2)) # <------------ will hit key not found in both schemas. joined = Declaration( "hashjoin", options=join_opts, inputs=[left_source, right_source]) result = joined.to_table() ``` if we don't use filter at all, we are ok with same column, and we can use output_suffix_for_left to help for the output only. @zanmato1984 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org