Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

via GitHub Fri, 22 Nov 2024 15:07:07 -0800


aihuaxu commented on code in PR #461:
URL: https://github.com/apache/parquet-format/pull/461#discussion_r1854863959



##########
VariantEncoding.md:
##########
@@ -39,13 +39,41 @@ Another motivation for the representation is that (aside 
from metadata) each nes
 For example, in a Variant containing an Array of Variant values, the 
representation of an inner Variant value, when paired with the metadata of the 
full variant, is itself a valid Variant.
 
 This document describes the Variant Binary Encoding scheme.
-[VariantShredding.md](VariantShredding.md) describes the details of the 
Variant shredding scheme.
+The [Variant Shredding specification](VariantShredding.md) describes the 
details of shredding Variant values as typed Parquet columns.
+
+## Variant in Parquet
 
-# Variant in Parquet
 A Variant value in Parquet is represented by a group with 2 fields, named 
`value` and `metadata`.
-Both fields `value` and `metadata` are of type `binary`, and cannot be `null`.
 
-# Metadata encoding
+* The Variant group must be annotated with the `VARIANT` logical type.

Review Comment:
   When Spark implements the Variant, internally it limits value and metadata 
to 16MB so it will have the limit on how much data can be stored. Different 
engines may have different limits internally and some Variant values may not be 
loaded across the engines. I'm wondering if we should have such limit in the 
spec explicitly. But of course it's really hard to really define such kind of 
limit since the input data of same size may be encoded differently. 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

Reply via email to