judahrand commented on code in PR #34234:
URL: https://github.com/apache/arrow/pull/34234#discussion_r1125748731
##########
python/pyarrow/_dataset.pyx:
##########
@@ -514,6 +514,53 @@ cdef class Dataset(_Weakrefable):
use_threads=use_threads,
coalesce_keys=coalesce_keys,
output_type=InMemoryDataset)
+ def join_asof(self, right_dataset, on, by, tolerance, right_on=None,
right_by=None):
+ """
+ Perform an asof join between this dataset and another one.
+
+ Result of the join will be a new dataset, where further
+ operations can be applied.
+
+ Parameters
+ ----------
+ right_dataset : dataset
+ The dataset to join to the current one, acting as the right dataset
+ in the join operation.
+ on : str
+ The column from current dataset that should be used as the on key
+ of the join operation left side.
+ by : str or list[str]
+ The columns from current dataset that should be used as the by keys
+ of the join operation left side.
+ tolerance : int
Review Comment:
This is a good question actually... I'm not 100% sure how this is intended
to work. The C++ implementation exclusively accepts an `int64_t` for the
tolerance. It simply states that it will use the same units as the `on`
column... it is unclear what that means. I'd assumed it mean the resolution of
the timestamp in a timestamp case.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]