arthurpassos commented on code in PR #35825:
URL: https://github.com/apache/arrow/pull/35825#discussion_r1229642355
##########
cpp/src/parquet/encoding.cc:
##########
@@ -1854,15 +1920,30 @@ void
DictDecoderImpl<ByteArrayType>::InsertDictionary(::arrow::ArrayBuilder* bui
PARQUET_THROW_NOT_OK(binary_builder->InsertMemoValues(*arr));
}
-class DictByteArrayDecoderImpl : public DictDecoderImpl<ByteArrayType>,
- virtual public ByteArrayDecoder {
+template <>
+void DictDecoderImpl<LargeByteArrayType>::InsertDictionary(
+ ::arrow::ArrayBuilder* builder) {
+ auto binary_builder =
checked_cast<::arrow::LargeBinaryDictionary32Builder*>(builder);
+
+ // Make a LargeBinaryArray referencing the internal dictionary data
+ auto arr = std::make_shared<::arrow::LargeBinaryArray>(
+ dictionary_length_, byte_array_offsets_, byte_array_data_);
Review Comment:
Hm.. This might actually be a problem if I understood it correctly.
::arrow::LargeBinaryArray uses ::arrow::LargeBinaryType, which defines
offset_type to be 64 bits. The byte_array_offsets_ object is a buffer, so I
assume that upon reading there'll be a "blind" cast that assumes offsets are 64
bit long, but since we are passing a buffer with 32 bit offsets, it'll be a
problem.
Is that correct?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]