Re: [I] [Variant] API to construct Shredded Variant Arrays [arrow-rs]

via GitHub Fri, 18 Jul 2025 13:58:08 -0700


scovich commented on issue #7895:
URL: https://github.com/apache/arrow-rs/issues/7895#issuecomment-3090705369


   > We would need this schema:
   > 
   > ```
   > STRUCT {
   >   metadata: BinaryView,
   >   value: BinaryView,
   >   typed_value: STRUCT {
   >     foo: Int64,
   >     bar: Int32
   >   }
   > }
   > ```
   
   We need to decide whether the shredding schema should match what we will 
physically write, or if it's a logical schema for convenience? Because the 
actual physical shredding schema for the above would be:
   ```
   STRUCT {
     metadata: BINARY,
     value: BINARY,
     typed_value: STRUCT {
       foo: STRUCT {
         value: BINARY,
         typed_value: Int64,
       },
       bar: STRUCT {
         value: BINARY,
         typed_value: Int32,
       },
     },
   }
   ```
   (we could debate whether a missing `value` column is a request from the user 
to drop all values that don't shred properly... but that seems like a massive 
footgun. I'd rather let the perfect shredding case encode all-null and let the 
parquet writer drop the column if it wants to)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Variant] API to construct Shredded Variant Arrays [arrow-rs]

Reply via email to