INT96 *should* be reserved for actual integers and not a fixed(12). The
type implies that it is a single big number.
The only place it is used is for the Impala INT96 timestamp type. That
happened because we (Cloudera) didn't discuss how to properly store
timestamps with the upstream community. The implementers needed a way to
write the type and know it was the timestamp, and using INT96 for that
purpose seemed like a good idea at the time, I guess.
The right way to add a type would have been to discuss the type with the
upstream community and add an annotation, along with rules for where
that annotation can be used. That allows us to use the right storage
(e.g., a 12-byte fixed) and tell the type apart from other data with the
same physical type. That's what we're doing with all new types these days.
rb
On 06/24/2015 12:34 PM, Cheng Lian wrote:
Hey Parquet devs,
It seems that in parquet-mr, INT96 is always treated as
FIXED_LENGTH_BYTE_ARRAY(12). I wonder is it reasonable to say that INT96
is just a convenient alias of FIXED_LENGTH_BYTE_ARRAY(12)? Are there any
semantics/performance differences? Currently, the only case where I
found INT96 is useful is for representing timestamp type with nanosec
precision in Impala. Did I miss something here?
Best,
Cheng
--
Ryan Blue
Software Engineer
Cloudera, Inc.