jorisvandenbossche commented on a change in pull request #9164:
URL: https://github.com/apache/arrow/pull/9164#discussion_r555650652



##########
File path: cpp/src/arrow/compute/api_scalar.h
##########
@@ -290,17 +290,19 @@ Result<Datum> KleeneAndNot(const Datum& left, const 
Datum& right,
 /// \brief IsIn returns true for each element of `values` that is contained in
 /// `value_set`
 ///
-/// If null occurs in left, if null count in right is not 0,
-/// it returns true, else returns null.
+/// Behaviour of nulls is governed by SetLookupOptions::skip_nulls.

Review comment:
       Also, could maybe add a similar sentence "Behaviour of nulls is governed 
by SetLookupOptions::skip_nulls" to the `is_in_doc` ?

##########
File path: cpp/src/arrow/compute/kernels/scalar_set_lookup.cc
##########
@@ -272,10 +278,9 @@ struct IsInVisitor {
 
   Status Visit(const DataType&) {
     const auto& state = checked_cast<const 
SetLookupState<NullType>&>(*ctx->state());
+    // XXX should skip_nulls be taken into account?

Review comment:
       It's a bit strange corner case, but for consistency probably yes? 
   
   Right now for null array the skip_nulls argument has no effect (using this 
PR):
   
   ```
   In [27]: pc.is_in(pa.array([None]), value_set=pa.array([None]), 
skip_null=False)
   Out[27]: 
   <pyarrow.lib.BooleanArray object at 0x7fac1d800880>
   [
     true
   ]
   
   In [28]: pc.is_in(pa.array([None]), value_set=pa.array([None]), 
skip_null=True)
   Out[28]: 
   <pyarrow.lib.BooleanArray object at 0x7fac1d800ca0>
   [
     true
   ]
   ```
   while if you add a non-null type to the array creation, but use the same 
values, you get a different result:
   
   ```
   In [30]: pc.is_in(pa.array([None], type="int64"), value_set=pa.array([None], 
type="int64"), skip_null=False)
   Out[30]: 
   <pyarrow.lib.BooleanArray object at 0x7fac1d82f880>
   [
     true
   ]
   
   In [31]: pc.is_in(pa.array([None], type="int64"), value_set=pa.array([None], 
type="int64"), skip_null=True)
   Out[31]: 
   <pyarrow.lib.BooleanArray object at 0x7fac1d7c2040>
   [
     false
   ]
   ```

##########
File path: cpp/src/arrow/compute/api_scalar.h
##########
@@ -290,17 +290,19 @@ Result<Datum> KleeneAndNot(const Datum& left, const 
Datum& right,
 /// \brief IsIn returns true for each element of `values` that is contained in
 /// `value_set`
 ///
-/// If null occurs in left, if null count in right is not 0,
-/// it returns true, else returns null.
+/// Behaviour of nulls is governed by SetLookupOptions::skip_nulls.

Review comment:
       Could mention here that the default is `skip_nulls=False` (or in the 
docstring of SetLookupOptions)?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to