Re: Is INT96 just an alias of FIXED_LENGTH_BYTE_ARRAY(12)?

Cheng Lian Sun, 28 Jun 2015 01:41:28 -0700

Yeah, initial nanosec timestamp support in Spark SQL follows Impala anduses INT96 to improve interoperability with Impala. In Spark1.5.0-SNAPSHOT (the current master branch), although we still writetimestamps as INT96, internally Spark SQL only uses a LONG to representtimestamps for better performance. The cost is that the precision islowered to 100ns.

Since INT96 is being deprecated, what's the suggested/planned way toread/write high precision nanosec timestamps then? Spark SQL, Hive, andImpala all have nanosec timestamp type, while Parquet format specdoesn't include it (only TIMESTAMP_MILLIS and TIMESTAMP_MICROS areavailable for now). Should we add a TIMESTAMP_NANOS annotation overFIXED_LENGTH_BYTE_ARRAY(12) and corresponding backwards-compatibility rules?


Cheng

On 6/24/15 1:21 PM, Nathan Howell wrote:

On 6/24/15, 1:17 PM, "Ryan Blue" <[email protected]> wrote:

:(

We'll want to deprecate those and move away from them. We're trying to
get support for real timestamps, along with backward-compatibility for
existing data, as soon as possible. I'm trying to get a commitment for
the next point release of CDH to fix it.

Actually it seems to have been added in 1.3.0, not 1.4.0:

https://issues.apache.org/jira/browse/SPARK-4987


-n

Re: Is INT96 just an alias of FIXED_LENGTH_BYTE_ARRAY(12)?

Reply via email to