Re: List of Additions to Parquet 2

Wes McKinney Thu, 16 Jun 2016 13:20:23 -0700

To add a one bit of context, we're looking at the handling of integers
other than INT32 and INT64 from the perspective of Apache Arrow. It
seems that in Parquet 1 files, you may not be able to recover the
original integer types from the file alone. The question is, should we
put this metadata in the Parquet file? See
https://github.com/apache/arrow/pull/89/files#diff-147a93dad8a2dfdac5531007c5c686b1R67


If it may cause problems, we can leave the physical storage type as is
and leave users to explicitly cast on deserialization to another
integer type.

Thanks,
Wes

On Thu, Jun 16, 2016 at 12:57 PM, Uwe Korn <[email protected]> wrote:
> Hello,
>
> I'm currently looking at the differences between Parquet 1 and Parquet 2 to
> implement these versions as a switch in parquet-cpp. The only list I could
> find is the rather undetailed changelog [1]. Is there maybe some better list
> or do I need to go through the referenced changesets entries myself to find
> the actual differences? (If the latter is the case, I'd also make a PR
> afterwards that augments the documentation with some "(since version 2.0)"
> markings. But I'm hoping a bit that there is some blog post or so out there
> that could make my life easier.
>
> Thanks,
>
> Uwe
>
> [1] https://github.com/apache/parquet-format/blob/master/CHANGES.md
>

Re: List of Additions to Parquet 2

Reply via email to