AlenkaF commented on issue #37729:
URL: https://github.com/apache/arrow/issues/37729#issuecomment-1853734264
I tried the code locally on dev (`15.0.0.dev`) and wasn't able to reproduce:
```python
In [1]: import pyarrow.parquet as pq
...:
...: table_1 = pq.read_table("python/table_1.parquet")
...: table_2 = pq.read_table("python/table_2.parquet")
...:
...: inner_joined = table_1.join(table_2, ["col_1", "col_2", "col_3"],
join_type="inner",)
...: print(len(inner_joined))
...:
...: inner_joined = table_2.join(table_1, ["col_1", "col_2", "col_3"],
join_type="inner", use_threads=False)
...: print(len(inner_joined))
...:
...: outer_joined = table_1.join(table_2, ["col_1", "col_2", "col_3"],
join_type="left outer")
...: print(len(outer_joined))
...:
...: print(outer_joined.column("col_4").null_count)
6289
6289
6289
0
```
but when running the same code on pyarrow version `13.0.0` I get the
reported behaviour:
```python
>>> import pyarrow as pa
>>> pa.__version__
'13.0.0'
>>> import pyarrow.parquet as pq
>>> table_1 = pq.read_table("../repos/arrow/python/table_1.parquet")
>>> table_2 = pq.read_table("../repos/arrow/python/table_2.parquet")
>>> inner_joined = table_1.join(table_2, ["col_1", "col_2", "col_3"],
join_type="inner",)
>>> print(len(inner_joined))
3596
>>> inner_joined = table_2.join(table_1, ["col_1", "col_2", "col_3"],
join_type="inner", use_threads=False)
>>> print(len(inner_joined))
3601
>>> outer_joined = table_1.join(table_2, ["col_1", "col_2", "col_3"],
join_type="left outer")
>>> print(len(outer_joined))
6289
>>> print(outer_joined.column("col_4").null_count)
0
```
I will try to see which PR fixed this issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]