AlenkaF commented on issue #37729:
URL: https://github.com/apache/arrow/issues/37729#issuecomment-1853734264

   I tried the code locally on dev (`15.0.0.dev`) and wasn't able to reproduce:
   
   ```python
   In [1]: import pyarrow.parquet as pq
      ...: 
      ...: table_1 = pq.read_table("python/table_1.parquet")
      ...: table_2 = pq.read_table("python/table_2.parquet")
      ...: 
      ...: inner_joined = table_1.join(table_2, ["col_1", "col_2", "col_3"], 
join_type="inner",)
      ...: print(len(inner_joined))
      ...: 
      ...: inner_joined = table_2.join(table_1, ["col_1", "col_2", "col_3"], 
join_type="inner", use_threads=False)
      ...: print(len(inner_joined))
      ...: 
      ...: outer_joined = table_1.join(table_2, ["col_1", "col_2", "col_3"], 
join_type="left outer")
      ...: print(len(outer_joined))
      ...: 
      ...: print(outer_joined.column("col_4").null_count)
   
   6289
   6289
   6289
   0
   ```
   
   but when running the same code on pyarrow version `13.0.0` I get the 
reported behaviour:
   
   ```python
   >>> import pyarrow as pa
   >>> pa.__version__
   '13.0.0'
   
   >>> import pyarrow.parquet as pq
   >>> table_1 = pq.read_table("../repos/arrow/python/table_1.parquet")
   >>> table_2 = pq.read_table("../repos/arrow/python/table_2.parquet")
   
   >>> inner_joined = table_1.join(table_2, ["col_1", "col_2", "col_3"], 
join_type="inner",)
   >>> print(len(inner_joined))
   3596
   
   >>> inner_joined = table_2.join(table_1, ["col_1", "col_2", "col_3"], 
join_type="inner", use_threads=False)
   >>> print(len(inner_joined))
   3601
   
   >>> outer_joined = table_1.join(table_2, ["col_1", "col_2", "col_3"], 
join_type="left outer")
   >>> print(len(outer_joined))
   6289
   
   >>> print(outer_joined.column("col_4").null_count)
   0
   ```
   
   I will try to see which PR fixed this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to