[I] Variant Blog Post [parquet-site]

via GitHub Fri, 09 Jan 2026 04:09:13 -0800


alamb opened a new issue, #147:
URL: https://github.com/apache/parquet-site/issues/147


   - Part of https://github.com/apache/parquet-site/issues/146
   
   We recently added Variant to parquet format -- see 
https://parquet.apache.org/docs/file-format/types/variantencoding/
   
   However, the only documentation that currently exists is the low level 
technical spec. 
   
   This is somewhat akward to explain to others how we should add Variant 
support to various systems:  there is no high level overview / link to point 
people at.
   
   I suggest a blog post with a a high level
   1. Explains the usecase of Variant (semi-structured data)
   2. Gives a technical overview of the encoding (with diagrams)
   3. Explains how shredding works, gives some examples (with diagrams)
   
   I would love a story arc that also highlights the fact that this is a major 
new addition to the Parquet spec and that it has already seen wide adoption 
https://parquet.apache.org/docs/file-format/implementationstatus/
   
   
   Some existing material we should feel free to re-use:
   1. [slides from Accelerating Apache Parquet with metadata stores and 
specialized indexes using Apache DataFusion 
](https://docs.google.com/presentation/d/1e_Z_F8nt2rcvlNvhU11khF5lzJJVqNtqtyJ-G3mp4-Q),
  [recording ](https://www.youtube.com/watch?v=74YsJT1-Rdk)YouTube
   2. [Introduction to Iceberg / Parquet Variant NYC Apache Iceberg™ Community 
Meetup](https://docs.google.com/presentation/d/1NN583KuJ3nelIrrH64HAASmbRtO9khhNSO3fw0_Nslc)
   3. Original databricks variant announcement: 
https://www.databricks.com/blog/introducing-open-variant-data-type-delta-lake-and-apache-spark
   4. DataBricks announcement: 
https://www.databricks.com/blog/introducing-variant-new-open-standard-semi-structured-data-apache-parquettm-delta-lake
 (this focuses quite a bit on system-level integration of variant in DataBricks 
and reads more like product feature announcement rather than a technical 
overview)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Variant Blog Post [parquet-site]

Reply via email to