alamb commented on code in PR #8031:
URL: https://github.com/apache/arrow-rs/pull/8031#discussion_r2246296972
##########
parquet-variant/src/builder.rs:
##########
@@ -1462,49 +1441,36 @@ impl<'a> ObjectBuilder<'a> {
let num_fields = self.fields.len();
let is_large = num_fields > u8::MAX as usize;
- let header_size = 1 + // header byte
- (if is_large { 4 } else { 1 }) + // num_fields
- (num_fields * id_size as usize) + // field IDs
- ((num_fields + 1) * offset_size as usize); // field offsets +
data_size
+ let num_fileds_size = if is_large { 4 } else { 1 }; // is_large: 4
bytes, else 1 byte.
Review Comment:
This variable name seems to have a typo:
```suggestion
let num_fields_size = if is_large { 4 } else { 1 }; // is_large: 4
bytes, else 1 byte.
```
##########
parquet-variant/src/builder.rs:
##########
@@ -64,19 +64,55 @@ fn write_offset(buf: &mut Vec<u8>, value: usize, nbytes:
u8) {
buf.extend_from_slice(&bytes[..nbytes as usize]);
}
-/// Write little-endian integer to buffer at a specific position
-fn write_offset_at_pos(buf: &mut [u8], start_pos: usize, value: usize, nbytes:
u8) {
- let bytes = value.to_le_bytes();
- buf[start_pos..start_pos + nbytes as
usize].copy_from_slice(&bytes[..nbytes as usize]);
-}
-
/// Append `value_size` bytes of given `value` into `dest`.
fn append_packed_u32(dest: &mut Vec<u8>, value: u32, value_size: usize) {
let n = dest.len() + value_size;
dest.extend(value.to_le_bytes());
dest.truncate(n);
}
+/// An iterator that yields the bytes of a packed u32 iterator.
+/// Will yield the first `packed_bytes` bytes of each item in the iterator.
+struct PackedU32Iterator<T: Iterator<Item = [u8; 4]>> {
+ packed_bytes: usize,
Review Comment:
I think to get this really fast we will need to use generics based on the
packed_bytes size -- so we would end up with different versions of the code for
1,2 and 4 byte offsets
We could try to make this particular iterator generic, but it might get a
bit messy.
I was thinking maybe we can somehow structure the code so there is a
function like this that writes the header for a certain offset size:
```rust
fn write_header<const SIZE: usize>(dst: &mut Vec<u8>, ...) {
...
}
```
Then we would basically have a switch like this to instantiate the
appropriate versions (can probably avoid the panic)
```rust
match int_size(max_id as usize) {
1 => write_header::<1>(dst, ...),
2 => write_header::<2>(dst, ...),
4 => write_header::<4>(dst, ...),
_ => panic!("unsupported size")
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]