alamb opened a new issue, #140: URL: https://github.com/apache/parquet-site/issues/140
We recently added Variant to parquet format -- see https://github.com/apache/parquet-format/blob/master/VariantEncoding.md#encoding-types However, the only documentation that currently exists is the low level technical spec. The higher level Parquet documentation does not contain anything about Variant: https://parquet.apache.org/docs/ This is inconvenient as I try and discuss adding Variant support to various systems: there is no high level overview / link to point people at . I would like to have a high level summary page in parquet.apache.org that: 1. Explains the usecase of Variant (semi-structured data) 2. Gives a technical overview of the encoding (with diagrams0\) 3. Explains how shredding works, gives some examples (with diagrams) Some existing material to use: 1. [slides from Accelerating Apache Parquet with metadata stores and specialized indexes using Apache DataFusion ](https://docs.google.com/presentation/d/1e_Z_F8nt2rcvlNvhU11khF5lzJJVqNtqtyJ-G3mp4-Q), [recording ](https://www.youtube.com/watch?v=74YsJT1-Rdk)YouTube 2. Original databricks announcement: https://www.databricks.com/blog/introducing-open-variant-data-type-delta-lake-and-apache-spark 3. DataBricks announcement: https://www.databricks.com/blog/introducing-variant-new-open-standard-semi-structured-data-apache-parquettm-delta-lake (this focuses quite a bit on system-level integration of variant in DataBricks and reads more like product feature announcement rather than) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
