zeroshade commented on code in PR #34195:
URL: https://github.com/apache/arrow/pull/34195#discussion_r1131237027
##########
cpp/src/arrow/compute/kernels/vector_run_end_encode.cc:
##########
@@ -623,6 +579,13 @@ void RegisterVectorRunEndEncode(FunctionRegistry*
registry) {
auto function = std::make_shared<VectorFunction>("run_end_encode",
Arity::Unary(),
run_end_encode_doc);
+ // NOTE: When the input to run_end_encode() is a ChunkedArray, the output is
also a
+ // ChunkedArray with the same number of chunks as the input. Each chunk in
the output
+ // has the same logical length as the corresponding chunk in the input. This
simplicity
+ // has a small downside: if a run of identical values crosses a chunk
boundary, this run
+ // cannot be encoded as a single run in the output. This is a conscious
trade-off as
+ // trying to solve this corner-case would complicate the implementation,
+ // require reallocations, and could create surprising behavior for users of
this API.
Review Comment:
I think the comment in the code here is sufficient to me. It might be
worthwhile to add a quick sentence in the function docs to say that if you pass
a chunked array you can end up with runs with the same value at the end/start
of consecutive chunks but this comment is enough to make me happy here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]