felipecrv commented on code in PR #43256:
URL: https://github.com/apache/arrow/pull/43256#discussion_r1697467263


##########
cpp/src/arrow/compute/expression.cc:
##########
@@ -1242,8 +1273,92 @@ struct Inequality {
                             /*insert_implicit_casts=*/false, &exec_context);
   }
 
+  /// Simplify an `is_in` value set against an inequality guarantee.
+  ///
+  /// Simplifying an `is_in` predicate involves filtering out any values from
+  /// the value set that cannot possibly be found given the guarantee. For
+  /// example, if we have the predicate 'x is_in [1, 2, 3, 4]' and the 
guarantee
+  /// 'x > 2', then the simplified predicate 'x is_in [3, 4]' is equivalent.
+  /// This can be done efficiently if the value set is sorted and unique by
+  /// binary searching the inequality gound and slicing the value set
+  /// accordingly.
+  ///
+  /// \pre `guarantee` is non-nullable
+  /// \pre `guarantee.bound` is a scalar
+  /// \pre `guarantee.bound.type()->id() == value_set->type_id()`
+  /// \pre `value_set` is non-empty
+  /// \return a simplified value set, or a bool if the simplification results 
in
+  ///   a boolean literal predicate.
+  template <typename ArrowType>
+  static std::variant<std::shared_ptr<Array>, bool> SimplifyIsInValueSet(

Review Comment:
   Totally overkill to have a full compute function. The binary search should 
be extracted as a local C++ function template here because it really is the 
only part that needs ~17 template specializations. That will require moving the 
`switch` into this function.



##########
cpp/src/arrow/compute/expression.cc:
##########
@@ -1242,8 +1273,92 @@ struct Inequality {
                             /*insert_implicit_casts=*/false, &exec_context);
   }
 
+  /// Simplify an `is_in` value set against an inequality guarantee.
+  ///
+  /// Simplifying an `is_in` predicate involves filtering out any values from
+  /// the value set that cannot possibly be found given the guarantee. For
+  /// example, if we have the predicate 'x is_in [1, 2, 3, 4]' and the 
guarantee
+  /// 'x > 2', then the simplified predicate 'x is_in [3, 4]' is equivalent.
+  /// This can be done efficiently if the value set is sorted and unique by
+  /// binary searching the inequality gound and slicing the value set
+  /// accordingly.
+  ///
+  /// \pre `guarantee` is non-nullable
+  /// \pre `guarantee.bound` is a scalar
+  /// \pre `guarantee.bound.type()->id() == value_set->type_id()`
+  /// \pre `value_set` is non-empty
+  /// \return a simplified value set, or a bool if the simplification results 
in
+  ///   a boolean literal predicate.
+  template <typename ArrowType>
+  static std::variant<std::shared_ptr<Array>, bool> SimplifyIsInValueSet(

Review Comment:
   Totally overkill to have a full compute function. But the binary search 
should be extracted as a local C++ function template here because it really is 
the only part that needs ~17 template specializations. That will require moving 
the `switch` into this function.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to