Parquet only provides a limited set of types as building blocks.
Although we can add more original types (also called converted types in
some contexts) to represent more application level data types, it's not
open to extension for end users.
Basically, you need to map your own application data types to and from
Parquet types and do the conversion at application level. One of the
example is the user-defined types in Spark SQL. We first map UDTs to
basic Spark SQL data types, then convert Spark SQL data types to Parquet
types via a standard schema converter.
Cheng
On 9/7/15 10:26 PM, Edmon Begoli wrote:
Is there, or what is the best learning resource that would help me
understand how to canonically map the currently unsupported, nested
structured data formats into Parquet.
Ideally, I would like to have access to something showing step by step or
giving enough background explaining how to do it.
If no such thing exist, maybe you can point me out to some basic examples
that I could follow to learn the process.
I will work to contribute back any tutorials and documentation I produce
for my own and my teams use (as well as any produced code).
Thank you,
Edmon