jorisvandenbossche commented on code in PR #13281:
URL: https://github.com/apache/arrow/pull/13281#discussion_r886845637


##########
python/pyarrow/_exec_plan.pyx:
##########
@@ -259,13 +259,19 @@ def _perform_join(join_type, left_operand not None, 
left_keys,
         left_columns = []
     elif join_type == "inner":
         c_join_type = CJoinType_INNER
-        right_columns = set(right_columns) - set(right_keys)
+        right_columns = [
+            col for col in right_columns if col not in set(right_keys)

Review Comment:
   Not that it matters much because it are small numbers anyway, but I was 
wondering about it: given that `right_keys` is typically a short list, 
converting it to a set only introduces more overhead.
   
   ```
   In [16]: right_columns = ["a", "b", "c", "d", "e", "f"]
   
   In [17]: right_keys = ["b", "c"]
   
   In [18]: [col for col in right_columns if col not in set(right_keys)]
   Out[18]: ['a', 'd', 'e', 'f']
   
   In [19]: %timeit [col for col in right_columns if col not in set(right_keys)]
   691 ns ± 3.92 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
   
   In [20]: %timeit [col for col in right_columns if col not in right_keys]
   353 ns ± 19.1 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to