wjones127 commented on code in PR #14353:
URL: https://github.com/apache/arrow/pull/14353#discussion_r1083109332


##########
cpp/src/parquet/encoding.h:
##########
@@ -317,6 +317,13 @@ class TypedDecoder : virtual public Decoder {
                           int64_t valid_bits_offset,
                           typename EncodingTraits<DType>::Accumulator* out) = 
0;
 
+  virtual int DecodeArrow_opt(int num_values, int null_count, const uint8_t* 
valid_bits,
+                              int32_t* offset,
+                              std::shared_ptr<::arrow::ResizableBuffer>& 
values,
+                              int64_t valid_bits_offset, int32_t* 
bianry_length) {
+    return 0;

Review Comment:
   It seems like we are adding this because the other methods are based on 
builders (in the accumulator), and builder don't provide a way to transfer 
multiple values in one `memcpy`. Does that sound right?
   
   I wonder if we could add such a method on builders, and that might be a 
cleaner solution. Something like:
   
   ```cpp
   class BaseBinaryBuilder {
   ...
     Status UnsafeAppendValues(const uint8_t* values, int64_t length, const 
uint8_t* valid_bytes = NULL_PTR) { ... }
   };
   ```
   
   The reason getting back to builders might be good is that I don't think 
these changes handle the case where there are more values than can fit into a 
StringArray. The current implementation will split it up into chunks if it 
reaches capacity, and I think we need to keep that behavior.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to