[ 
https://issues.apache.org/jira/browse/IMPALA-7853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16715792#comment-16715792
 ] 

ASF subversion and git services commented on IMPALA-7853:
---------------------------------------------------------

Commit 56dd5767b87d13e467e88aa20fe33149681afc1e in impala's branch 
refs/heads/master from [~csringhofer]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=56dd576 ]

IMPALA-7853: Add support to read int64 NANO timestamps from Parquet

PARQUET-1387 added int64 timestamps with nanosecond precision that
stores timestamps as nanoseconds since the Unix epoch.
As 64 bits are not enough to represent the whole 1400..9999 range
of Impala timestamps, this new type works with a limited range:
1677-09-21 00:12:43.145224192 .. 2262-04-11 23:47:16.854775807 UTC

The benefit of the reduced range is that no validation is necessary
during scanning, as every possible 64 bit value represents a valid
timestamp in Impala. This may mean that this has the potential be
the fastest way to store timestamps in Impala + Parquet.

Another way NANO differs from MICRO and MILLI is that NANO can
be only described with new logical types in Parquet, it has no
converted type equivalent. This made implementing CREATE TABLE
LIKE PARQUET less trivial than it was for MICRO/MILLI: the type
conversion logic in ParquetHelper.java had to be rewritten to
use LogicalTypeAnnotation instead of ConvertedType.

The changes on Java side also made bumping CDH_BUILD_NUMBER
necessary.

Testing:
- added a new testfile with int64 nano timestamps
- ran core tests

Change-Id: I932396d8646f43c0b9ca4a6359f164c4d8349d8f
Reviewed-on: http://gerrit.cloudera.org:8080/11984
Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> Add support to read int64 NANO timestamps to the parquet scanner
> ----------------------------------------------------------------
>
>                 Key: IMPALA-7853
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7853
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Csaba Ringhofer
>            Assignee: Csaba Ringhofer
>            Priority: Major
>              Labels: parquet
>
> PARQUET-1387 added int64 timestamps with nanosecond precision.
> As 64 bits are not enough to represent the whole 1400..9999 range of Impala 
> timestamps,  this new new type works with a limited range:
> 1677-09-21 00:12:43.145224192  .. 2262-04-11 23:47:16.854775807 UTC
> The benefit of the reduced range is that no validation is necessary during 
> scanning, as every possible 64 bit value represents a valid timestamp in 
> Impala. This may mean that this has the potential be the fastest way to store 
> timestamps in Impala + Parquet.
> Another way NANO differs from MICRO and MILLI is that NANO can be only 
> described with new logical types in Parquet, it has no converted type 
> equivalent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to