alamb commented on code in PR #8031: URL: https://github.com/apache/arrow-rs/pull/8031#discussion_r2246296972
########## parquet-variant/src/builder.rs: ########## @@ -1462,49 +1441,36 @@ impl<'a> ObjectBuilder<'a> { let num_fields = self.fields.len(); let is_large = num_fields > u8::MAX as usize; - let header_size = 1 + // header byte - (if is_large { 4 } else { 1 }) + // num_fields - (num_fields * id_size as usize) + // field IDs - ((num_fields + 1) * offset_size as usize); // field offsets + data_size + let num_fileds_size = if is_large { 4 } else { 1 }; // is_large: 4 bytes, else 1 byte. Review Comment: This variable name seems to have a typo: ```suggestion let num_fields_size = if is_large { 4 } else { 1 }; // is_large: 4 bytes, else 1 byte. ``` ########## parquet-variant/src/builder.rs: ########## @@ -64,19 +64,55 @@ fn write_offset(buf: &mut Vec<u8>, value: usize, nbytes: u8) { buf.extend_from_slice(&bytes[..nbytes as usize]); } -/// Write little-endian integer to buffer at a specific position -fn write_offset_at_pos(buf: &mut [u8], start_pos: usize, value: usize, nbytes: u8) { - let bytes = value.to_le_bytes(); - buf[start_pos..start_pos + nbytes as usize].copy_from_slice(&bytes[..nbytes as usize]); -} - /// Append `value_size` bytes of given `value` into `dest`. fn append_packed_u32(dest: &mut Vec<u8>, value: u32, value_size: usize) { let n = dest.len() + value_size; dest.extend(value.to_le_bytes()); dest.truncate(n); } +/// An iterator that yields the bytes of a packed u32 iterator. +/// Will yield the first `packed_bytes` bytes of each item in the iterator. +struct PackedU32Iterator<T: Iterator<Item = [u8; 4]>> { + packed_bytes: usize, Review Comment: I think to get this really fast we will need to use generics based on the packed_bytes size -- so we would end up with different versions of the code for 1,2 and 4 byte offsets We could try to make this particular iterator generic, but it might get a bit messy. I was thinking maybe we can somehow structure the code so there is a function like this that writes the header for a certain offset size: ```rust fn write_header<const SIZE: usize>(dst: &mut Vec<u8>, ...) { ... } ``` Then we would basically have a switch like this to instantiate the appropriate versions (can probably avoid the panic) ```rust match int_size(max_id as usize) { 1 => write_header::<1>(dst, ...), 2 => write_header::<2>(dst, ...), 4 => write_header::<4>(dst, ...), _ => panic!("unsupported size") } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org