alamb opened a new issue, #147: URL: https://github.com/apache/parquet-site/issues/147
- Part of https://github.com/apache/parquet-site/issues/146 We recently added Variant to parquet format -- see https://parquet.apache.org/docs/file-format/types/variantencoding/ However, the only documentation that currently exists is the low level technical spec. This is somewhat akward to explain to others how we should add Variant support to various systems: there is no high level overview / link to point people at. I suggest a blog post with a a high level 1. Explains the usecase of Variant (semi-structured data) 2. Gives a technical overview of the encoding (with diagrams) 3. Explains how shredding works, gives some examples (with diagrams) I would love a story arc that also highlights the fact that this is a major new addition to the Parquet spec and that it has already seen wide adoption https://parquet.apache.org/docs/file-format/implementationstatus/ Some existing material we should feel free to re-use: 1. [slides from Accelerating Apache Parquet with metadata stores and specialized indexes using Apache DataFusion ](https://docs.google.com/presentation/d/1e_Z_F8nt2rcvlNvhU11khF5lzJJVqNtqtyJ-G3mp4-Q), [recording ](https://www.youtube.com/watch?v=74YsJT1-Rdk)YouTube 2. [Introduction to Iceberg / Parquet Variant NYC Apache Iceberg™ Community Meetup](https://docs.google.com/presentation/d/1NN583KuJ3nelIrrH64HAASmbRtO9khhNSO3fw0_Nslc) 3. Original databricks variant announcement: https://www.databricks.com/blog/introducing-open-variant-data-type-delta-lake-and-apache-spark 4. DataBricks announcement: https://www.databricks.com/blog/introducing-variant-new-open-standard-semi-structured-data-apache-parquettm-delta-lake (this focuses quite a bit on system-level integration of variant in DataBricks and reads more like product feature announcement rather than a technical overview) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
