scovich commented on issue #7895:
URL: https://github.com/apache/arrow-rs/issues/7895#issuecomment-3090705369

   > We would need this schema:
   > 
   > ```
   > STRUCT {
   >   metadata: BinaryView,
   >   value: BinaryView,
   >   typed_value: STRUCT {
   >     foo: Int64,
   >     bar: Int32
   >   }
   > }
   > ```
   
   We need to decide whether the shredding schema should match what we will 
physically write, or if it's a logical schema for convenience? Because the 
actual physical shredding schema for the above would be:
   ```
   STRUCT {
     metadata: BINARY,
     value: BINARY,
     typed_value: STRUCT {
       foo: STRUCT {
         value: BINARY,
         typed_value: Int64,
       },
       bar: STRUCT {
         value: BINARY,
         typed_value: Int32,
       },
     },
   }
   ```
   (we could debate whether a missing `value` column is a request from the user 
to drop all values that don't shred properly... but that seems like a massive 
footgun. I'd rather let the perfect shredding case encode all-null and let the 
parquet writer drop the column if it wants to)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to