On 03/03/2015 05:32 AM, Guillaume Polaert wrote:
Hi,
Is there someone working on Int96/Impala compatibility for parquet-pig?
Actually, if I understand, Parquet doesn't have a Timestamp with Nano class
that can handle this format? Is it right?
At least, we can provide a feature to map Int96 to DateTime (pigloader) and
vice-verca, loosing nano precision of course.
What do you think?
I don't think that the Parquet community should add support for int96
timestamps. The int96 timestamp format is undocumented, though
implemented in Hive and Impala. It also uses an unannotated int96, so
there is no way to distinguish between a real int96 and a timestamp.
I don't think that a file format like Parquet should add support for
undocumented types that are specific to an application. Applications are
free to store data in Parquet's types as they like by keeping additional
metadata (the column's timestamp type in Impala), but the format should
only coordinate those higher-level types through annotations.
I think support for Impala's int96 timestamp should be done in a UDF as
was suggested on PARQUET-195, and we should add a nanosecond-precision
timestamp type annotation to coordinate future uses.
Does that sound like a reasonable way forward?
rb
--
Ryan Blue
Software Engineer
Cloudera, Inc.