zanmato1984 commented on code in PR #44394:
URL: https://github.com/apache/arrow/pull/44394#discussion_r1823743699
##########
cpp/src/arrow/compute/api_vector.h:
##########
@@ -705,5 +736,56 @@ Result<std::shared_ptr<Array>> PairwiseDiff(const Array&
array,
bool check_overflow = false,
ExecContext* ctx = NULLPTR);
+/// \brief Return the reverse indices of the given indices.
+///
+/// For indices[i] = x, reverse_indices[x] = i. And reverse_indices[x] = null
if x does
+/// not appear in the input indices. For indices[i] = x where x < 0 or x >=
output_length,
+/// it is ignored. If multiple indices point to the same value, the last one
is used.
+///
+/// For example, with indices = [null, 0, 3, 2, 4, 1, 1], the reverse indices
is
+/// [1, 6, 3] if output_length = 3,
+/// [1, 6, 3, 2, 4, null, null] if output_length = 7.
Review Comment:
It is particularly useful for cases where the input indices are only a
subset of the output (in other words, the output contains "holes"/`null`s that
are not pointed by any elements in the input). Consider an input `[3, 0]`
(imagine that row `1` and `2` are evaluated to `null` in the `if_else` special
form's condition so they go to neither `true` nor `false` branch, thus having
no slots in the input to `reverse_indices`/`permute`), it is desired to have
the output as `[1, null, null, 0]` (as opposed to a presumable length-2 output
`[1, null]`).
This is also noted in my comments in one of the tests:
https://github.com/zanmato1984/arrow/blob/944609c0f4c68a73633d32ca1fd75c2a57198eb2/cpp/src/arrow/compute/kernels/vector_placement_test.cc#L995-L1000
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]