mapleFU commented on code in PR #38437:
URL: https://github.com/apache/arrow/pull/38437#discussion_r1386772388
##########
cpp/src/parquet/encoding.cc:
##########
@@ -1205,16 +1205,22 @@ struct ArrowBinaryHelper<ByteArrayType> {
return Status::OK();
}
- Status PrepareNextInput(int64_t next_value_length,
- std::optional<int64_t>
estimated_remaining_data_length = {}) {
+ Status PrepareNextInput(int64_t next_value_length) {
+ if (ARROW_PREDICT_FALSE(!CanFit(next_value_length))) {
+ // This element would exceed the capacity of a chunk
Review Comment:
This part doesn't call
`RETURN_NOT_OK(acc_->builder->Reserve(entries_remaining_));`, this keeps some
logic same as before https://github.com/apache/arrow/pull/14341/files , it
leave `Prepare()` to allocate the data and don't push when chunk is switched.
But I don't know if this is right
##########
cpp/src/parquet/encoding.cc:
##########
@@ -1205,16 +1205,22 @@ struct ArrowBinaryHelper<ByteArrayType> {
return Status::OK();
}
- Status PrepareNextInput(int64_t next_value_length,
- std::optional<int64_t>
estimated_remaining_data_length = {}) {
+ Status PrepareNextInput(int64_t next_value_length) {
+ if (ARROW_PREDICT_FALSE(!CanFit(next_value_length))) {
+ // This element would exceed the capacity of a chunk
+ return PushChunk();
+ }
+ return Status::OK();
+ }
+
+ Status PrepareNextInputWithEstimatedLength(int64_t next_value_length,
Review Comment:
Split into two functions to make it fast and clear
##########
cpp/src/parquet/encoding.cc:
##########
@@ -1983,7 +1990,6 @@ class DictByteArrayDecoderImpl : public
DictDecoderImpl<ByteArrayType>,
int values_decoded = 0;
ArrowBinaryHelper<ByteArrayType> helper(out, num_values);
- RETURN_NOT_OK(helper.Prepare(len_));
Review Comment:
Ditto
##########
cpp/src/parquet/encoding.cc:
##########
@@ -1915,7 +1923,6 @@ class DictByteArrayDecoderImpl : public
DictDecoderImpl<ByteArrayType>,
int32_t indices[kBufferSize];
ArrowBinaryHelper<ByteArrayType> helper(out, num_values);
- RETURN_NOT_OK(helper.Prepare());
Review Comment:
Though I think this Prepare is ok, original code doesn't have this Prepare,
perhaps it would make entries large?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]