To add a one bit of context, we're looking at the handling of integers other than INT32 and INT64 from the perspective of Apache Arrow. It seems that in Parquet 1 files, you may not be able to recover the original integer types from the file alone. The question is, should we put this metadata in the Parquet file? See https://github.com/apache/arrow/pull/89/files#diff-147a93dad8a2dfdac5531007c5c686b1R67
If it may cause problems, we can leave the physical storage type as is and leave users to explicitly cast on deserialization to another integer type. Thanks, Wes On Thu, Jun 16, 2016 at 12:57 PM, Uwe Korn <[email protected]> wrote: > Hello, > > I'm currently looking at the differences between Parquet 1 and Parquet 2 to > implement these versions as a switch in parquet-cpp. The only list I could > find is the rather undetailed changelog [1]. Is there maybe some better list > or do I need to go through the referenced changesets entries myself to find > the actual differences? (If the latter is the case, I'd also make a PR > afterwards that augments the documentation with some "(since version 2.0)" > markings. But I'm hoping a bit that there is some blog post or so out there > that could make my life easier. > > Thanks, > > Uwe > > [1] https://github.com/apache/parquet-format/blob/master/CHANGES.md >
