[
https://issues.apache.org/jira/browse/PARQUET-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ryan Blue updated PARQUET-200:
------------------------------
Description:
When the date/time type annotations were added, we decided not to add
precisions smaller than milliseconds because there wasn't a clear requirement.
I think that the requirement is for nanosecond precision. The SQL spec requires
at least microsecond. Some databases support nanosecond, including SQL engines
on Hadoop like Phoenix. Hive and Impala currently support nanosecond times
using an int96, but intend to move to microsecond precision with this spec.
I propose adding the following type annotations:
* {{TIME_MICROS}}: annotates an int64 (8 bytes), represents the number of
microseconds from midnight.
* {{TIMESTAMP_MICROS}}: annotates an int64 (8 bytes), represents the number of
microseconds from the unix epoch.
was:
When the date/time type annotations were added, we decided not to add
precisions smaller than milliseconds because there wasn't a clear requirement.
I think that the requirement is for nanosecond precision. The SQL spec requires
at least microsecond, and many databases support nanosecond, including SQL
engines on Hadoop: Hive, Phoenix, and Impala.
I propose adding the following type annotations:
* {{TIME_NANOS}}: annotates an int64 (8 bytes), represents the number of
nanoseconds from midnight.
* {{TIMESTAMP_NANOS}}: annotates a 12-byte fixed, containing first an 8-byte
number of milliseconds from unix epoch and, second, a 4-byte number of
nanoseconds from the 8-byte time (nanoseconds from the last millisecond). Both
values are little-endian.
The timestamp type allows object models that don't support nanosecond times (or
don't need it for processing) to easily ignore the second value.
> Add nanosecond time and timestamp annotations
> ---------------------------------------------
>
> Key: PARQUET-200
> URL: https://issues.apache.org/jira/browse/PARQUET-200
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-format
> Reporter: Ryan Blue
>
> When the date/time type annotations were added, we decided not to add
> precisions smaller than milliseconds because there wasn't a clear
> requirement. I think that the requirement is for nanosecond precision. The
> SQL spec requires at least microsecond. Some databases support nanosecond,
> including SQL engines on Hadoop like Phoenix. Hive and Impala currently
> support nanosecond times using an int96, but intend to move to microsecond
> precision with this spec.
> I propose adding the following type annotations:
> * {{TIME_MICROS}}: annotates an int64 (8 bytes), represents the number of
> microseconds from midnight.
> * {{TIMESTAMP_MICROS}}: annotates an int64 (8 bytes), represents the number
> of microseconds from the unix epoch.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)