wjones127 commented on code in PR #14353:
URL: https://github.com/apache/arrow/pull/14353#discussion_r1083109332
##########
cpp/src/parquet/encoding.h:
##########
@@ -317,6 +317,13 @@ class TypedDecoder : virtual public Decoder {
int64_t valid_bits_offset,
typename EncodingTraits<DType>::Accumulator* out) =
0;
+ virtual int DecodeArrow_opt(int num_values, int null_count, const uint8_t*
valid_bits,
+ int32_t* offset,
+ std::shared_ptr<::arrow::ResizableBuffer>&
values,
+ int64_t valid_bits_offset, int32_t*
bianry_length) {
+ return 0;
Review Comment:
It seems like we are adding this because the other methods are based on
builders (in the accumulator), and builder don't provide a way to transfer
multiple values in one `memcpy`. Does that sound right?
I wonder if we could add such a method on builders, and that might be a
cleaner solution. Something like:
```cpp
class BaseBinaryBuilder {
...
Status UnsafeAppendValues(const uint8_t* values, int64_t length, const
uint8_t* valid_bytes = NULL_PTR) { ... }
};
```
The reason getting back to builders might be good is that I don't think
these changes handle the case where there are more values than can fit into a
StringArray. The current implementation will split it up into chunks if it
reaches capacity, and I think we need to keep that behavior.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]