arthurpassos commented on code in PR #35825:
URL: https://github.com/apache/arrow/pull/35825#discussion_r1230911188


##########
cpp/src/parquet/encoding.cc:
##########
@@ -1854,15 +1920,30 @@ void 
DictDecoderImpl<ByteArrayType>::InsertDictionary(::arrow::ArrayBuilder* bui
   PARQUET_THROW_NOT_OK(binary_builder->InsertMemoValues(*arr));
 }
 
-class DictByteArrayDecoderImpl : public DictDecoderImpl<ByteArrayType>,
-                                 virtual public ByteArrayDecoder {
+template <>
+void DictDecoderImpl<LargeByteArrayType>::InsertDictionary(
+    ::arrow::ArrayBuilder* builder) {
+  auto binary_builder = 
checked_cast<::arrow::LargeBinaryDictionary32Builder*>(builder);
+
+  // Make a LargeBinaryArray referencing the internal dictionary data
+  auto arr = std::make_shared<::arrow::LargeBinaryArray>(
+      dictionary_length_, byte_array_offsets_, byte_array_data_);

Review Comment:
   Hm.. Not sure I follow, you mean allocate it as a `int64_t` on the below 
code?
   
   ```
       PARQUET_THROW_NOT_OK(
           byte_array_offsets_->Resize((dictionary_length_ + 1) * 
sizeof(int32_t),
                                       /*shrink_to_fit=*/false));
   ```
   
   That does not seem to make much sense to me, because it stores `ByteArray` 
(contains 32bit length). Plus, the logic on `DictDecoderImpl::SetByteArrayDict` 
would have to change. I might also be confusing it with the dictionary we 
agreed would be 32 bit



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to