On 03/03/2015 05:32 AM, Guillaume Polaert wrote:
Hi,

Is there someone working on Int96/Impala compatibility for parquet-pig?

Actually, if I understand, Parquet doesn't have a Timestamp with Nano class
that can handle this format? Is it right?

At least, we can provide a feature to map Int96 to DateTime (pigloader) and
vice-verca, loosing nano precision of course.
What do you think?

I don't think that the Parquet community should add support for int96 timestamps. The int96 timestamp format is undocumented, though implemented in Hive and Impala. It also uses an unannotated int96, so there is no way to distinguish between a real int96 and a timestamp.

I don't think that a file format like Parquet should add support for undocumented types that are specific to an application. Applications are free to store data in Parquet's types as they like by keeping additional metadata (the column's timestamp type in Impala), but the format should only coordinate those higher-level types through annotations.

I think support for Impala's int96 timestamp should be done in a UDF as was suggested on PARQUET-195, and we should add a nanosecond-precision timestamp type annotation to coordinate future uses.

Does that sound like a reasonable way forward?

rb

--
Ryan Blue
Software Engineer
Cloudera, Inc.

Reply via email to