[I] Add documentation about Variant to the website [parquet-site]


alamb opened a new issue, #140:
URL: https://github.com/apache/parquet-site/issues/140

We recently added Variant to parquet format -- see
https://github.com/apache/parquet-format/blob/master/VariantEncoding.md#encoding-types

However, the only documentation that currently exists is the low level
technical spec. The higher level Parquet documentation does not contain
anything about Variant: https://parquet.apache.org/docs/

This is inconvenient as I try and discuss adding Variant support to various
systems: there is no high level overview / link to point people at .

I would like to have a high level summary page in parquet.apache.org that:
1. Explains the usecase of Variant (semi-structured data)
2. Gives a technical overview of the encoding (with diagrams0\)
3. Explains how shredding works, gives some examples (with diagrams)

Some existing material to use:
1. [slides from Accelerating Apache Parquet with metadata stores and
specialized indexes using Apache DataFusion
](https://docs.google.com/presentation/d/1e_Z_F8nt2rcvlNvhU11khF5lzJJVqNtqtyJ-G3mp4-Q),
[recording ](https://www.youtube.com/watch?v=74YsJT1-Rdk)YouTube
2. Original databricks announcement:
https://www.databricks.com/blog/introducing-open-variant-data-type-delta-lake-and-apache-spark
3. DataBricks announcement:
https://www.databricks.com/blog/introducing-variant-new-open-standard-semi-structured-data-apache-parquettm-delta-lake
(this focuses quite a bit on system-level integration of variant in DataBricks
and reads more like product feature announcement rather than)

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to