Jefffrey commented on code in PR #10232:
URL: https://github.com/apache/arrow-rs/pull/10232#discussion_r3489050023
##########
parquet/src/arrow/buffer/dictionary_buffer.rs:
##########
@@ -185,8 +185,18 @@ impl<K: ArrowNativeType + Ord, V: OffsetSizeTrait>
DictionaryBuffer<K, V> {
ArrowType::Dictionary(k, v) => (k, v.as_ref().clone()),
_ => unreachable!(),
};
+ let array = if let ArrowType::FixedSizeBinary(size) =
value_type {
Review Comment:
Matching what was done above in the `Self::Dict` path:
https://github.com/apache/arrow-rs/blob/7616e10f12894232eacb807993e55427a067d061/parquet/src/arrow/buffer/dictionary_buffer.rs#L159-L168
- as introduced originally by https://github.com/apache/arrow-rs/pull/7446
This is because `values.into_array()` assumed a generic byte offset array:
https://github.com/apache/arrow-rs/blob/7616e10f12894232eacb807993e55427a067d061/parquet/src/arrow/buffer/offset_buffer.rs#L132-L146
Which is the wrong structure for a fixedsizebinary array (we don't have
offsets buffer, etc.)
##########
parquet/src/arrow/arrow_writer/mod.rs:
##########
@@ -2954,8 +2944,12 @@ mod tests {
)
.unwrap();
- let data = DictionaryArray::<K>::new(keys, Arc::new(values));
- let batch = RecordBatch::try_new(Arc::new(schema),
vec![Arc::new(data)]).unwrap();
+ let data = Arc::new(DictionaryArray::<K>::new(keys,
Arc::new(values))) as ArrayRef;
+ one_column_roundtrip(Arc::clone(&data), true);
Review Comment:
From the reproduction, added `one_column_roundtrip()` usage here to test
roundtrip where one of the cases is disabling dictionary encoding entirely
Rest of test changes is just restructuring
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]