bepec opened a new issue, #41706:
URL: https://github.com/apache/arrow/issues/41706

   ### Describe the usage question you have. Please include as many useful 
details as  possible.
   
   
   With pyarrow 16.0.0, I can't apply join_asof although the input tables are 
ordered by "on" key.
   Noticed when trying to merge bigger sorted tables - for example, it fails 
for tables with rows numbers 1061753 & 994046, but can be executed if I reduce 
numbers to 1048178 & 975257.
   
   I think this behavior can be reproduced with an example below:
   ```
   import numpy as np
   ts0 = 0
   nticks = 2_000_000 # it's OK for nticks = 1_000_000
   ncats = 10
   ticks = np.arange(ts0, ts0 + nticks)
   cats = np.arange(0, ncats).repeat(nticks/ncats)
   t1 = pa.Table.from_pydict({"ts": ticks, "cats": cats})
   t2 = pa.Table.from_pydict({"ts": ticks, "cats": cats})
   t1.join_asof(t2, on="ts", tolerance=-10, by="cats")
   
   # Last line fails with error:
   ---------------------------------------------------------------------------
   ArrowInvalid                              Traceback (most recent call last)
   Cell In[273], line 10
         8 t1 = pa.Table.from_pydict({"ts": ticks, "cats": cats})
         9 t2 = pa.Table.from_pydict({"ts": ticks, "cats": cats})
   ---> 10 t1.join_asof(t2, on="ts", tolerance=-10, by="cats")
   
   File /lib/python3.10/site-packages/pyarrow/table.pxi:5528, in 
pyarrow.lib.Table.join_asof()
   
   File /lib/python3.10/site-packages/pyarrow/acero.py:333, in 
_perform_join_asof(left_operand, left_on, left_by, right_operand, right_on, 
right_by, tolerance, use_threads, output_type)
       326 join_opts = AsofJoinNodeOptions(
       327     left_on, left_by, right_on, right_by, tolerance
       328 )
       329 decl = Declaration(
       330     "asofjoin", options=join_opts, inputs=[left_source, right_source]
       331 )
   --> 333 result_table = decl.to_table(use_threads=use_threads)
       335 if output_type == Table:
       336     return result_table
   
   File /lib/python3.10/site-packages/pyarrow/_acero.pyx:590, in 
pyarrow._acero.Declaration.to_table()
   
   File /lib/python3.10/site-packages/pyarrow/error.pxi:154, in 
pyarrow.lib.pyarrow_internal_check_status()
   
   File /lib/python3.10/site-packages/pyarrow/error.pxi:91, in 
pyarrow.lib.check_status()
   
   ArrowInvalid: AsofJoin does not allow out-of-order on-key values
   ```
   
   So I suspect the issue has nothing to do with the on-key values order, but 
rather the input size?
   Is it the bug that can be fixed or some fundamental limitation?
   Is there any workaround other than limiting input size?
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to