pitrou commented on a change in pull request #11886:
URL: https://github.com/apache/arrow/pull/11886#discussion_r777518602



##########
File path: cpp/src/arrow/compute/kernels/vector_selection_test.cc
##########
@@ -2328,5 +2328,40 @@ TEST_F(TestDropNullKernelWithTable, 
DropNullTableWithSlices) {
   });
 }
 
+TEST(TestIndicesNonZero, IndicesNonZero) {
+  Datum actual;
+  std::shared_ptr<Array> result;
+
+  ASSERT_OK_AND_ASSIGN(
+      actual,
+      CallFunction("indices_nonzero", {ArrayFromJSON(uint32(), "[null, 50, 0, 
10]")}));
+  result = actual.make_array();
+  AssertArraysEqual(*result, *ArrayFromJSON(uint64(), "[1, 3]"));
+
+  ASSERT_OK_AND_ASSIGN(
+      actual, CallFunction("indices_nonzero",
+                           {ArrayFromJSON(boolean(), "[null, true, false, 
true]")}));
+  result = actual.make_array();
+  AssertArraysEqual(*result, *ArrayFromJSON(uint64(), "[1, 3]"));
+
+  ASSERT_OK_AND_ASSIGN(actual,
+                       CallFunction("indices_nonzero",
+                                    {ArrayFromJSON(float64(), "[null, 1.3, 
0.0, 5.0]")}));
+  result = actual.make_array();
+  AssertArraysEqual(*result, *ArrayFromJSON(uint64(), "[1, 3]"));
+
+  ASSERT_OK_AND_ASSIGN(actual,
+                       CallFunction("indices_nonzero", 
{ArrayFromJSON(float64(), "[]")}));
+  result = actual.make_array();
+  AssertArraysEqual(*result, *ArrayFromJSON(uint64(), "[]"));
+
+  ChunkedArray chunkedarr(
+      {ArrayFromJSON(uint32(), "[1, 0, 3]"), ArrayFromJSON(uint32(), "[4, 0, 
6]")});
+  ASSERT_OK_AND_ASSIGN(actual,
+                       CallFunction("indices_nonzero", 
{static_cast<Datum>(chunkedarr)}));
+  Datum expected = ChunkedArrayFromJSON(uint64(), {R"([0, 2])", R"([0, 2])"});

Review comment:
       Hmm, ok, so this shows that the result is unexpected. The kernel should 
return the logical indices inside the chunked array, so in this case `[0, 2, 3, 
5]` (the number of chunks in the result is an implementation detail, though).
   
   Consider for example:
   ```python
   >>> pc.sort_indices(pa.chunked_array([[1, 0, 3], [5, 0, 2]]))
   <pyarrow.lib.UInt64Array object at 0x7f87c24145e0>
   [
     1,
     4,
     0,
     5,
     2,
     3
   ]
   ```

##########
File path: docs/source/cpp/compute.rst
##########
@@ -1549,6 +1549,17 @@ These functions select and return a subset of their 
input.
 * \(4) For each element *i* in input 2 (the indices), the *i*'th element
   in input 1 (the values) is appended to the output.
 
+Containment tests
+~~~~~~~~~~~~~~~~~
+
+This functions return the indices at which elements match a predicate

Review comment:
       ```suggestion
   This function returns the indices at which array elements are non-null and 
non-zero.
   ```

##########
File path: cpp/src/arrow/compute/kernels/vector_selection_test.cc
##########
@@ -2328,5 +2328,40 @@ TEST_F(TestDropNullKernelWithTable, 
DropNullTableWithSlices) {
   });
 }
 
+TEST(TestIndicesNonZero, IndicesNonZero) {
+  Datum actual;
+  std::shared_ptr<Array> result;
+
+  ASSERT_OK_AND_ASSIGN(
+      actual,
+      CallFunction("indices_nonzero", {ArrayFromJSON(uint32(), "[null, 50, 0, 
10]")}));
+  result = actual.make_array();
+  AssertArraysEqual(*result, *ArrayFromJSON(uint64(), "[1, 3]"));
+
+  ASSERT_OK_AND_ASSIGN(
+      actual, CallFunction("indices_nonzero",
+                           {ArrayFromJSON(boolean(), "[null, true, false, 
true]")}));
+  result = actual.make_array();
+  AssertArraysEqual(*result, *ArrayFromJSON(uint64(), "[1, 3]"));
+
+  ASSERT_OK_AND_ASSIGN(actual,
+                       CallFunction("indices_nonzero",
+                                    {ArrayFromJSON(float64(), "[null, 1.3, 
0.0, 5.0]")}));
+  result = actual.make_array();
+  AssertArraysEqual(*result, *ArrayFromJSON(uint64(), "[1, 3]"));
+
+  ASSERT_OK_AND_ASSIGN(actual,
+                       CallFunction("indices_nonzero", 
{ArrayFromJSON(float64(), "[]")}));
+  result = actual.make_array();
+  AssertArraysEqual(*result, *ArrayFromJSON(uint64(), "[]"));
+
+  ChunkedArray chunkedarr(
+      {ArrayFromJSON(uint32(), "[1, 0, 3]"), ArrayFromJSON(uint32(), "[4, 0, 
6]")});
+  ASSERT_OK_AND_ASSIGN(actual,
+                       CallFunction("indices_nonzero", 
{static_cast<Datum>(chunkedarr)}));
+  Datum expected = ChunkedArrayFromJSON(uint64(), {R"([0, 2])", R"([0, 2])"});

Review comment:
       So, you probably want to set `can_execute_chunkwise` to false on the 
`VectorKernel`, and be prepared to received a chunked array input.

##########
File path: docs/source/cpp/compute.rst
##########
@@ -1549,6 +1549,17 @@ These functions select and return a subset of their 
input.
 * \(4) For each element *i* in input 2 (the indices), the *i*'th element
   in input 1 (the values) is appended to the output.
 
+Containment tests
+~~~~~~~~~~~~~~~~~
+
+This functions return the indices at which elements match a predicate
+
++-----------------------+-------+-----------------------------------+----------------+---------------------------------+-------+
+| Function name         | Arity | Input types                       | Output 
type    | Options class                   | Notes |

Review comment:
       No need to add "Options class" and "Notes" columns if they remain empty, 
IMHO.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to