judahrand commented on code in PR #34234:
URL: https://github.com/apache/arrow/pull/34234#discussion_r1125748731


##########
python/pyarrow/_dataset.pyx:
##########
@@ -514,6 +514,53 @@ cdef class Dataset(_Weakrefable):
                                               use_threads=use_threads, 
coalesce_keys=coalesce_keys,
                                               output_type=InMemoryDataset)
 
+    def join_asof(self, right_dataset, on, by, tolerance, right_on=None, 
right_by=None):
+        """
+        Perform an asof join between this dataset and another one.
+
+        Result of the join will be a new dataset, where further
+        operations can be applied.
+
+        Parameters
+        ----------
+        right_dataset : dataset
+            The dataset to join to the current one, acting as the right dataset
+            in the join operation.
+        on : str
+            The column from current dataset that should be used as the on key
+            of the join operation left side.
+        by : str or list[str]
+            The columns from current dataset that should be used as the by keys
+            of the join operation left side.
+        tolerance : int

Review Comment:
   This is a good question actually... I'm not 100% sure how this is intended 
to work. The C++ implementation exclusively accepts an `int64_t` for the 
tolerance. It simply states that it will use the same units as the `on` 
column... it is unclear what that means. I'd assumed it meant the resolution of 
the timestamp in a timestamp case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to