Understood.

I would not be defining new types, but new standard nested structures, so
for that I probably don't need to modify Parquet at all.

For doing actual layout conversions and definition of required vs. optional
fields, etc., would you suggest Avro or Thrift as the good media to do this?

Something like:
https://github.com/adobe-research/spark-parquet-thrift-example





On Tue, Sep 8, 2015 at 10:59 AM, Cheng Lian <[email protected]> wrote:

> Parquet only provides a limited set of types as building blocks. Although
> we can add more original types (also called converted types in some
> contexts) to represent more application level data types, it's not open to
> extension for end users.
>
> Basically, you need to map your own application data types to and from
> Parquet types and do the conversion at application level. One of the
> example is the user-defined types in Spark SQL. We first map UDTs to basic
> Spark SQL data types, then convert Spark SQL data types to Parquet types
> via a standard schema converter.
>
> Cheng
>
>
> On 9/7/15 10:26 PM, Edmon Begoli wrote:
>
>> Is there, or what is the best learning resource that would help me
>> understand how to canonically map the currently unsupported, nested
>> structured data formats into Parquet.
>>
>> Ideally, I would like to have access to something showing step by step or
>> giving enough background explaining how to do it.
>>
>> If no such thing exist, maybe you can point me out to some basic examples
>> that I could follow to learn the process.
>>
>> I will work to contribute back any tutorials and documentation I produce
>> for my own and my teams use (as well as any produced code).
>>
>> Thank you,
>> Edmon
>>
>>
>

Reply via email to