judahrand commented on code in PR #34234: URL: https://github.com/apache/arrow/pull/34234#discussion_r1125748731
########## python/pyarrow/_dataset.pyx: ########## @@ -514,6 +514,53 @@ cdef class Dataset(_Weakrefable): use_threads=use_threads, coalesce_keys=coalesce_keys, output_type=InMemoryDataset) + def join_asof(self, right_dataset, on, by, tolerance, right_on=None, right_by=None): + """ + Perform an asof join between this dataset and another one. + + Result of the join will be a new dataset, where further + operations can be applied. + + Parameters + ---------- + right_dataset : dataset + The dataset to join to the current one, acting as the right dataset + in the join operation. + on : str + The column from current dataset that should be used as the on key + of the join operation left side. + by : str or list[str] + The columns from current dataset that should be used as the by keys + of the join operation left side. + tolerance : int Review Comment: This is a good question actually... I'm not 100% sure how this is intended to work. The C++ implementation exclusively accepts an `int64_t` for the tolerance. It simply states that it will use the same units as the `on` column... it is unclear what that means. I'd assumed it meant the resolution of the timestamp in a timestamp case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org