INT96 *should* be reserved for actual integers and not a fixed(12). The type implies that it is a single big number.

The only place it is used is for the Impala INT96 timestamp type. That happened because we (Cloudera) didn't discuss how to properly store timestamps with the upstream community. The implementers needed a way to write the type and know it was the timestamp, and using INT96 for that purpose seemed like a good idea at the time, I guess.

The right way to add a type would have been to discuss the type with the upstream community and add an annotation, along with rules for where that annotation can be used. That allows us to use the right storage (e.g., a 12-byte fixed) and tell the type apart from other data with the same physical type. That's what we're doing with all new types these days.

rb

On 06/24/2015 12:34 PM, Cheng Lian wrote:
Hey Parquet devs,

It seems that in parquet-mr, INT96 is always treated as
FIXED_LENGTH_BYTE_ARRAY(12). I wonder is it reasonable to say that INT96
is just a convenient alias of FIXED_LENGTH_BYTE_ARRAY(12)? Are there any
semantics/performance differences? Currently, the only case where I
found INT96 is useful is for representing timestamp type with nanosec
precision in Impala. Did I miss something here?

Best,
Cheng


--
Ryan Blue
Software Engineer
Cloudera, Inc.

Reply via email to