[ 
https://issues.apache.org/jira/browse/IMPALA-5942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16842108#comment-16842108
 ] 

Gabor Kaszab commented on IMPALA-5942:
--------------------------------------

I had a short discussion about this topic with [~grahn] the other day as I ran 
into handling dateless timestamp with regard to IMPALA-4018. I think we have to 
consider 2 different scenarios here:

1) When a datetime pattern is given without specifying the date.
{code:java}
+--------------------------------------+
| to_timestamp('01:50:00', 'hh:mm:ss') |
+--------------------------------------+
| 01:50:00                             |
+--------------------------------------+
{code}
In this case I think Impala should reject the query with an error during the 
pattern analysis. (Note, by analysis I don't mean query analysis in the 
frontend, rather the parsing of the format in the backend.)

2) When no datetime pattern is given but the actual input is dateless.
{code:java}
select cast(field_name as timestamp) from table_name;
{code}
{code:java}
insert into table2_with_timestamp_col select string_col_storing_timestamps from 
table1;
{code}
Here, we can't reject the query with an error as we have no knowledge on the 
data that the query is run on. The options we have here is:
 - return null for dateless timestamp values
 - default their date part to some hardcoded date (such as the smallest date 
Impala's timestamp can hold.)
 - default their date part to current date

My least favourite is the 3r because we would end up having different results 
for the same query depending on when we run it.
 Between the first two I feel returning null as the cleaner solution but this 
is not based on scientific reasoning or such just my impression.
  
 According to Greg there are no known users who rely on dateless timestamps as 
that is kind of an edge case. So I have one question that bothers me:
 Isn't this considered a breaking change? Are we flexible enough to deliver 
something like this in a minor release?

> Dateless timestamps (e.g. "10:00:00") are handled inconsistently 
> -----------------------------------------------------------------
>
>                 Key: IMPALA-5942
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5942
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.11.0
>            Reporter: Csaba Ringhofer
>            Priority: Major
>              Labels: timestamp
>
> Impala cannot read back these timestamps from Parquet, while it  can read 
> them back from textfiles.
> According to the documentation, Impala should be able to handle these values 
> somehow, as the examples contain "select cast('08:30:00' as timestamp);"
> see http://impala.apache.org/docs/build/html/topics/impala_timestamp.html 
> {code}
> text:
> create table TT1 (t timestamp);
> insert into TT1 (t) values ("10:00:00");
> select * from TT1;
> +----------+
> | t        |
> +----------+
> | 10:00:00 |
> +----------+
> parquet:
> create table TT2(t timestamp) STORED AS PARQUET;
> insert into TT2 (t) values ("10:00:00");
> select * from TT2;
> +------+
> | t    |
> +------+
> | NULL |
> +------+
> WARNINGS: Parquet file 
> 'hdfs://localhost:20500/test-warehouse/tt2/714d741212df3180-cd4e670800000000_226739479_data.0.parq'
>  column 't' contains an out of range timestamp. The valid date range is 
> 1400-01-01..9999-12-31.
> {code}
> I think that this is a side effect of the fix of IMPALA-4363, but I did not 
> check what happens in versions that did not contain this fix.
> UPDATE: I have checked the last commit before the fix of  IMPALA-4363, and it 
> does not have this bug.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to