scovich commented on issue #7895:
URL: https://github.com/apache/arrow-rs/issues/7895#issuecomment-3090705369
> We would need this schema:
>
> ```
> STRUCT {
> metadata: BinaryView,
> value: BinaryView,
> typed_value: STRUCT {
> foo: Int64,
> bar: Int32
> }
> }
> ```
We need to decide whether the shredding schema should match what we will
physically write, or if it's a logical schema for convenience? Because the
actual physical shredding schema for the above would be:
```
STRUCT {
metadata: BINARY,
value: BINARY,
typed_value: STRUCT {
foo: STRUCT {
value: BINARY,
typed_value: Int64,
},
bar: STRUCT {
value: BINARY,
typed_value: Int32,
},
},
}
```
(we could debate whether a missing `value` column is a request from the user
to drop all values that don't shred properly... but that seems like a massive
footgun. I'd rather let the perfect shredding case encode all-null and let the
parquet writer drop the column if it wants to)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]