felipecrv commented on code in PR #34195:
URL: https://github.com/apache/arrow/pull/34195#discussion_r1130255227


##########
cpp/src/arrow/compute/kernels/vector_run_end_encode.cc:
##########
@@ -623,6 +579,13 @@ void RegisterVectorRunEndEncode(FunctionRegistry* 
registry) {
   auto function = std::make_shared<VectorFunction>("run_end_encode", 
Arity::Unary(),
                                                    run_end_encode_doc);
 
+  // NOTE: When the input to run_end_encode() is a ChunkedArray, the output is 
also a
+  // ChunkedArray with the same number of chunks as the input. Each chunk in 
the output
+  // has the same logical length as the corresponding chunk in the input. This 
simplicity
+  // has a small downside: if a run of identical values crosses a chunk 
boundary, this run
+  // cannot be encoded as a single run in the output. This is a conscious 
trade-off as
+  // trying to solve this corner-case would complicate the implementation,
+  // require reallocations, and could create surprising behavior for users of 
this API.

Review Comment:
   @zeroshade what you think? Should this be part of the function documentation?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to