scovich commented on code in PR #7987:
URL: https://github.com/apache/arrow-rs/pull/7987#discussion_r2232259020
##########
parquet-variant/src/builder.rs:
##########
@@ -1256,29 +1221,28 @@ impl<'a> ListBuilder<'a> {
let starting_offset = self.parent_value_offset_base;
+ let header_size = 1 + // header
+ if is_large { 4 } else { 1 } + // is_large
Review Comment:
```suggestion
let num_elements_size = if is_large { 4 } else { 1 }
let header_size = 1 + // header
num_elements_size + // num_elements
```
(and then can reuse `num_elements_size` below)
##########
parquet-variant/src/builder.rs:
##########
@@ -1121,16 +1099,27 @@ impl VariantBuilder {
pub struct ListBuilder<'a> {
parent_state: ParentState<'a>,
offsets: Vec<usize>,
- buffer: ValueBuffer,
+ /// The starting offset in the parent's buffer where this list starts
+ parent_value_offset_base: usize,
+ /// The starting offset in the parent's metadata buffer where this list
starts
+ /// used to truncate the written fields in `drop` if the current list has
not been finished
+ parent_metadata_offset_base: usize,
+ /// Whether the list has been finished, the written content of the current
list
+ /// will be truncated in `drop` if `has_been_finished` is false
+ has_been_finished: bool,
validate_unique_fields: bool,
}
impl<'a> ListBuilder<'a> {
fn new(parent_state: ParentState<'a>, validate_unique_fields: bool) ->
Self {
+ let offset_base = parent_state.buffer_current_offset();
+ let meta_offset_base = parent_state.metadata_current_offset();
Self {
parent_state,
offsets: vec![],
- buffer: ValueBuffer::default(),
+ parent_value_offset_base: offset_base,
+ has_been_finished: false,
+ parent_metadata_offset_base: meta_offset_base,
Review Comment:
If we're anyway doing `:`, why not just fold in the logic directly?
```suggestion
parent_value_offset_base: parent_state.buffer_current_offset(),
has_been_finished: false,
parent_metadata_offset_base:
parent_state.metadata_current_offset(),
```
Alternatively, the `let` above could give the correct name from the start,
so it can just be passed directly:
```suggestion
parent_value_offset_base,
has_been_finished: false,
parent_metadata_offset_base,
```
##########
parquet-variant/src/builder.rs:
##########
@@ -1256,29 +1221,28 @@ impl<'a> ListBuilder<'a> {
let starting_offset = self.parent_value_offset_base;
+ let header_size = 1 + // header
+ if is_large { 4 } else { 1 } + // is_large
+ (self.offsets.len() + 1) * offset_size as usize; // offsets and
data size
+
+ // Calculated header size becomes a hint; being wrong only risks extra
allocations.
+ // Make sure to reserve enough capacity to handle the extra bytes
we'll truncate.
+ let mut bytes_to_splice = Vec::with_capacity(header_size + 3);
// Write header
let header = array_header(is_large, offset_size);
+ bytes_to_splice.push(header);
- let num_elements_bytes =
- num_elements
- .to_le_bytes()
- .into_iter()
- .take(if is_large { 4 } else { 1 });
- let offsets = PackedU32Iterator::new(
- offset_size as usize,
- self.offsets
- .clone()
- .into_iter()
- .map(|offset| (offset as u32).to_le_bytes()),
+ append_packed_u32(
+ &mut bytes_to_splice,
+ num_elements as u32,
+ if is_large { 4 } else { 1 },
Review Comment:
```suggestion
num_elements_size,
```
##########
parquet-variant/src/builder.rs:
##########
@@ -1256,29 +1221,28 @@ impl<'a> ListBuilder<'a> {
let starting_offset = self.parent_value_offset_base;
+ let header_size = 1 + // header
+ if is_large { 4 } else { 1 } + // is_large
+ (self.offsets.len() + 1) * offset_size as usize; // offsets and
data size
Review Comment:
```suggestion
(num_elements + 1) * offset_size as usize; // offsets and data
size
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]