Re: [PR] WIP: Current work on Variant specs [parquet-format]

via GitHub Fri, 25 Oct 2024 00:09:40 -0700


RussellSpitzer commented on code in PR #461:
URL: https://github.com/apache/parquet-format/pull/461#discussion_r1815648837



##########
VariantShredding.md:
##########
@@ -33,176 +33,239 @@ This document focuses on the shredding semantics, Parquet 
representation, implic
 For now, it does not discuss which fields to shred, user-facing API changes, 
or any engine-specific considerations like how to use shredded columns.
 The approach builds upon the [Variant Binary Encoding](VariantEncoding.md), 
and leverages the existing Parquet specification.
 
-At a high level, we replace the `value` field of the Variant Parquet group 
with one or more fields called `object`, `array`, `typed_value`, and 
`variant_value`.
-These represent a fixed schema suitable for constructing the full Variant 
value for each row.
-
 Shredding allows a query engine to reap the full benefits of Parquet's 
columnar representation, such as more compact data encoding, min/max statistics 
for data skipping, and I/O and CPU savings from pruning unnecessary fields not 
accessed by a query (including the non-shredded Variant binary data).

Review Comment:
   Another place I'd like to just remove some of the text here. My main goal 
here is just to reduce the amount of text in the spec



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] WIP: Current work on Variant specs [parquet-format]

Reply via email to