zeroshade commented on code in PR #34195:
URL: https://github.com/apache/arrow/pull/34195#discussion_r1131237027


##########
cpp/src/arrow/compute/kernels/vector_run_end_encode.cc:
##########
@@ -623,6 +579,13 @@ void RegisterVectorRunEndEncode(FunctionRegistry* 
registry) {
   auto function = std::make_shared<VectorFunction>("run_end_encode", 
Arity::Unary(),
                                                    run_end_encode_doc);
 
+  // NOTE: When the input to run_end_encode() is a ChunkedArray, the output is 
also a
+  // ChunkedArray with the same number of chunks as the input. Each chunk in 
the output
+  // has the same logical length as the corresponding chunk in the input. This 
simplicity
+  // has a small downside: if a run of identical values crosses a chunk 
boundary, this run
+  // cannot be encoded as a single run in the output. This is a conscious 
trade-off as
+  // trying to solve this corner-case would complicate the implementation,
+  // require reallocations, and could create surprising behavior for users of 
this API.

Review Comment:
   I think the comment in the code here is sufficient to me. It might be 
worthwhile to add a quick sentence in the function docs to say that if you pass 
a chunked array you can end up with runs with the same value at the end/start 
of consecutive chunks but this comment is enough to make me happy here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to