alamb commented on code in PR #8031:
URL: https://github.com/apache/arrow-rs/pull/8031#discussion_r2246296972


##########
parquet-variant/src/builder.rs:
##########
@@ -1462,49 +1441,36 @@ impl<'a> ObjectBuilder<'a> {
         let num_fields = self.fields.len();
         let is_large = num_fields > u8::MAX as usize;
 
-        let header_size = 1 + // header byte
-            (if is_large { 4 } else { 1 }) + // num_fields
-            (num_fields * id_size as usize) + // field IDs
-            ((num_fields + 1) * offset_size as usize); // field offsets + 
data_size
+        let num_fileds_size = if is_large { 4 } else { 1 }; // is_large: 4 
bytes, else 1 byte.

Review Comment:
   This variable name seems to have a typo:
   
   ```suggestion
           let num_fields_size = if is_large { 4 } else { 1 }; // is_large: 4 
bytes, else 1 byte.
   ```



##########
parquet-variant/src/builder.rs:
##########
@@ -64,19 +64,55 @@ fn write_offset(buf: &mut Vec<u8>, value: usize, nbytes: 
u8) {
     buf.extend_from_slice(&bytes[..nbytes as usize]);
 }
 
-/// Write little-endian integer to buffer at a specific position
-fn write_offset_at_pos(buf: &mut [u8], start_pos: usize, value: usize, nbytes: 
u8) {
-    let bytes = value.to_le_bytes();
-    buf[start_pos..start_pos + nbytes as 
usize].copy_from_slice(&bytes[..nbytes as usize]);
-}
-
 /// Append `value_size` bytes of given `value` into `dest`.
 fn append_packed_u32(dest: &mut Vec<u8>, value: u32, value_size: usize) {
     let n = dest.len() + value_size;
     dest.extend(value.to_le_bytes());
     dest.truncate(n);
 }
 
+/// An iterator that yields the bytes of a packed u32 iterator.
+/// Will yield the first `packed_bytes` bytes of each item in the iterator.
+struct PackedU32Iterator<T: Iterator<Item = [u8; 4]>> {
+    packed_bytes: usize,

Review Comment:
   I think to get this really fast we will need to use generics based on the 
packed_bytes size -- so we would end up with different versions of the code for 
1,2 and 4 byte offsets
   
   We could try to make this particular iterator generic, but it might get a 
bit messy.
   
   I was thinking maybe we can somehow structure the code so there is a 
function like this that writes the header for a certain offset size:
   
   ```rust
   fn write_header<const SIZE: usize>(dst: &mut Vec<u8>, ...) {
     ...
   }
   ```
   
   
   Then we would basically have a switch like this to instantiate the 
appropriate versions (can probably avoid the panic)
   
   ```rust
   match  int_size(max_id as usize) {
     1 => write_header::<1>(dst, ...),
     2 => write_header::<2>(dst, ...),
     4 => write_header::<4>(dst, ...),
     _ => panic!("unsupported size")
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to