anjakefala opened a new issue, #43716: URL: https://github.com/apache/arrow/issues/43716
### Describe the enhancement requested Acero's Hash Join does not support `ListType` in non-key fields for a hash join: https://github.com/apache/arrow/blob/main/cpp/src/arrow/acero/hash_join_node.cc#L48 . This is a request to add that support. PyArrow code that reproduces here: ``` import pyarrow as pa import pyarrow.acero as acero # Creating the Arrow tables basic_tbl = pa.table({'x': [1, 2, 3], 'y': ['a', 'b', 'c']}) basic_tbl_src = acero.Declaration("table_source", options=acero.TableSourceNodeOptions(basic_tbl)) basic_tbl2 = pa.table({'x': [1, 2, 3], 'z': [True, False, True]}) basic_tbl2_src = acero.Declaration("table_source", options=acero.TableSourceNodeOptions(basic_tbl2)) list_tbl = pa.table({'z': [['first', 'list', 'col', 'row'], ['second row', 'here']], 'x': [1, 2]}) list_tbl_src = acero.Declaration("table_source", options=acero.TableSourceNodeOptions(list_tbl)) join_keys = ["x"] hash_join_options = acero.HashJoinNodeOptions('left outer', left_keys=join_keys, right_keys=join_keys) joined = acero.Declaration( "hashjoin", options=hash_join_options, inputs=[basic_tbl_src, basic_tbl2_src]) result = joined.to_table() print(result) # list table joined = acero.Declaration( "hashjoin", options=hash_join_options, inputs=[basic_tbl_src, list_tbl_src]) result = joined.to_table() print(result) ``` R code here: https://issues.apache.org/jira/browse/ARROW-14519 In [that link](https://issues.apache.org/jira/browse/ARROW-14519), the reason there currently isn't support was noted: > We cannot easily support more types in hash join right now. That is because we transform and encode all the input values, key and non-key (row_encoder.h), so it would need another specialization for each additional type. So to add this support, it seems like we will need to add the specialisation for the encoding of `ListType`. ### Component(s) C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
