0x26res commented on issue #41706:
URL: https://github.com/apache/arrow/issues/41706#issuecomment-2127377220

   I have a similar issue with a smaller table. 
   
   It only happens if I have a lot of small chunks in the table.
   
   Here's an example:
   
   ```
   import pyarrow as pa
   import pytest
   from pandas import Timestamp
   
   LEFT = [
       {"left_on": Timestamp("2023-09-07 12:00:00+0000", tz="UTC"), "left_by": 
"SYM1"},
       {"left_on": Timestamp("2023-09-07 12:15:00+0000", tz="UTC"), "left_by": 
"SYM1"},
       {"left_on": Timestamp("2023-09-07 12:30:00+0000", tz="UTC"), "left_by": 
"SYM1"},
       {"left_on": Timestamp("2023-09-07 12:45:00+0000", tz="UTC"), "left_by": 
"SYM1"},
       {"left_on": Timestamp("2023-09-07 13:00:00+0000", tz="UTC"), "left_by": 
"SYM1"},
       {"left_on": Timestamp("2023-09-07 13:15:00+0000", tz="UTC"), "left_by": 
"SYM1"},
       {"left_on": Timestamp("2023-09-07 13:30:00+0000", tz="UTC"), "left_by": 
"SYM1"},
       {"left_on": Timestamp("2023-09-07 13:45:00+0000", tz="UTC"), "left_by": 
"SYM1"},
       {"left_on": Timestamp("2023-09-07 14:00:00+0000", tz="UTC"), "left_by": 
"SYM1"},
       {"left_on": Timestamp("2023-09-07 14:15:00+0000", tz="UTC"), "left_by": 
"SYM1"},
       {"left_on": Timestamp("2023-09-07 14:30:00+0000", tz="UTC"), "left_by": 
"SYM1"},
       {"left_on": Timestamp("2023-09-07 14:45:00+0000", tz="UTC"), "left_by": 
"SYM1"},
       {"left_on": Timestamp("2023-09-07 15:00:00+0000", tz="UTC"), "left_by": 
"SYM1"},
       {"left_on": Timestamp("2023-09-07 15:15:00+0000", tz="UTC"), "left_by": 
"SYM1"},
       {"left_on": Timestamp("2023-09-07 15:30:00+0000", tz="UTC"), "left_by": 
"SYM1"},
       {"left_on": Timestamp("2023-09-07 15:45:00+0000", tz="UTC"), "left_by": 
"SYM1"},
       {"left_on": Timestamp("2023-09-07 16:00:00+0000", tz="UTC"), "left_by": 
"SYM1"},
       {"left_on": Timestamp("2023-09-07 16:15:00+0000", tz="UTC"), "left_by": 
"SYM1"},
       {"left_on": Timestamp("2023-09-07 16:30:00+0000", tz="UTC"), "left_by": 
"SYM1"},
       {"left_on": Timestamp("2023-09-07 16:45:00+0000", tz="UTC"), "left_by": 
"SYM1"},
       {"left_on": Timestamp("2023-09-07 17:00:00+0000", tz="UTC"), "left_by": 
"SYM1"},
       {"left_on": Timestamp("2023-09-07 17:15:00+0000", tz="UTC"), "left_by": 
"SYM1"},
       {"left_on": Timestamp("2023-09-07 17:30:00+0000", tz="UTC"), "left_by": 
"SYM1"},
       {"left_on": Timestamp("2023-09-07 17:45:00+0000", tz="UTC"), "left_by": 
"SYM1"},
       {"left_on": Timestamp("2023-09-07 18:00:00+0000", tz="UTC"), "left_by": 
"SYM1"},
       {"left_on": Timestamp("2023-09-07 18:15:00+0000", tz="UTC"), "left_by": 
"SYM1"},
       {"left_on": Timestamp("2023-09-07 18:30:00+0000", tz="UTC"), "left_by": 
"SYM1"},
       {"left_on": Timestamp("2023-09-07 18:45:00+0000", tz="UTC"), "left_by": 
"SYM1"},
       {"left_on": Timestamp("2023-09-07 19:00:00+0000", tz="UTC"), "left_by": 
"SYM1"},
       {"left_on": Timestamp("2023-09-07 19:15:00+0000", tz="UTC"), "left_by": 
"SYM1"},
       {"left_on": Timestamp("2023-09-07 19:30:00+0000", tz="UTC"), "left_by": 
"SYM1"},
       {"left_on": Timestamp("2023-09-07 19:45:00+0000", tz="UTC"), "left_by": 
"SYM1"},
       {"left_on": Timestamp("2023-09-07 20:00:00+0000", tz="UTC"), "left_by": 
"SYM1"},
       {"left_on": Timestamp("2023-09-07 20:15:00+0000", tz="UTC"), "left_by": 
"SYM1"},
       {"left_on": Timestamp("2023-09-07 20:30:00+0000", tz="UTC"), "left_by": 
"SYM1"},
       {"left_on": Timestamp("2023-09-07 20:45:00+0000", tz="UTC"), "left_by": 
"SYM1"},
       {"left_on": Timestamp("2023-09-07 21:00:00+0000", tz="UTC"), "left_by": 
"SYM1"},
   ]
   RIGHT = [
       {
           "right_on": Timestamp("2023-09-07 15:00:00+0000", tz="UTC"),
           "right_by": "SYM1",
       }
   ]
   
   
   def test_asofjoin_order():
       left: pa.Table = pa.Table.from_pylist(LEFT)
       right = pa.Table.from_pylist(RIGHT)
   
       left = pa.concat_tables(left[i : i + 1] for i in range(left.num_rows))
       assert left[left.column_names[0]] == left[left.column_names[0]].sort()
       assert right[right.column_names[0]] == 
right[right.column_names[0]].sort()
       with pytest.raises(
           pa.ArrowInvalid, match="AsofJoin does not allow out-of-order on-key 
values"
       ):
           left.join_asof(
               right,
               on=left.column_names[0],
               by=left.column_names[1],
               right_on=right.column_names[0],
               right_by=right.column_names[1],
               tolerance=-9_223_372_036_854_775_808,
           )
   ```
   
   
   it took a while to make a reproducible example. I can't exactly pin down 
what is causing the issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to