[ 
https://issues.apache.org/jira/browse/IMPALA-12691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804512#comment-17804512
 ] 

Gabor Zele commented on IMPALA-12691:
-------------------------------------

Just to explain a bit more, I need the following query to convert some full and 
partial dates stored in strings to timestamps:

 
{noformat}
SELECT
    text_date,
    CAST(
            CONCAT_WS('-',
                
if(regexp_extract(text_date,'(?:[^\\w-]|^)(\\d{4})(?:-(\\d{2}))?(?:-(\\d{2}))?(?:[^\\w-]|$)',1)="",NULL,
                    
regexp_extract(text_date,'(?:[^\\w-]|^)(\\d{4})(?:-(\\d{2}))?(?:-(\\d{2}))?(?:[^\\w-]|$)',1)),
                
if(regexp_extract(text_date,'(?:[^\\w-]|^)(\\d{4})(?:-(\\d{2}))?(?:-(\\d{2}))?(?:[^\\w-]|$)',2)="","01",
                    
regexp_extract(text_date,'(?:[^\\w-]|^)(\\d{4})(?:-(\\d{2}))?(?:-(\\d{2}))?(?:[^\\w-]|$)',2)),
                
if(regexp_extract(text_date,'(?:[^\\w-]|^)(\\d{4})(?:-(\\d{2}))?(?:-(\\d{2}))?(?:[^\\w-]|$)',3)="","01",
                    
regexp_extract(text_date,'(?:[^\\w-]|^)(\\d{4})(?:-(\\d{2}))?(?:-(\\d{2}))?(?:[^\\w-]|$)',3))
            )
    AS TIMESTAMP)FROM ( VALUES 
    ('2001-01-01' as text_date),
    ('2002-01'),
    ('2003'),
    
    ('"2001-02-01"'),
    ('"2002-02"'),
    ('"2003"'),
    
    ('1111-2022-01-05'),
    ('20040101'),
    ('200401')
) AS text;{noformat}
Result:

 
||text_date||ts_date||
|2001-01-01|2001-01-01 00:00:00|
|2002-01|2002-01-01 00:00:00|
|2003|2003-01-01 00:00:00|
|"2001-02-01"|2001-02-01 00:00:00|
|"2002-02"|2002-02-01 00:00:00|
|"2003"|2003-01-01 00:00:00|
|1111-2022-01-05|NULL|
|20040101|NULL|
|200401|NULL|

While this query works well, it is quite complex, which makes it hard to 
understand, modifications more prone to errors.

 

I would prefer if I could do this:
{noformat}
SELECT
    text_date,
    CASE
        WHEN TO_TIMESTAMP(text_date, 'yyyy-MM-dd') IS NOT NULL   THEN 
TO_TIMESTAMP(text_date, 'yyyy-MM-dd')
        WHEN TO_TIMESTAMP(text_date, '"yyyy-MM-dd"') IS NOT NULL THEN 
TO_TIMESTAMP(text_date, '"yyyy-MM-dd"')
        WHEN TO_TIMESTAMP(text_date, 'yyyy-MM') IS NOT NULL      THEN 
TO_TIMESTAMP(text_date, 'yyyy-MM')
        WHEN TO_TIMESTAMP(text_date, '"yyyy-MM"') IS NOT NULL    THEN 
TO_TIMESTAMP(text_date, '"yyyy-MM"')
        WHEN TO_TIMESTAMP(text_date, 'yyyy') IS NOT NULL         THEN 
TO_TIMESTAMP(text_date, 'yyyy')
        WHEN TO_TIMESTAMP(text_date, '"yyyy"') IS NOT NULL       THEN 
TO_TIMESTAMP(text_date, '"yyyy"')
        ELSE NULL
    END as ts_date
FROM ( VALUES 
    ('2001-01-01' as text_date),
    ('2002-01'),
    ('2003'),
    
    ('"2001-02-01"'),
    ('"2002-02"'),
    ('"2003"'),
    
    ('1111-2022-01-05'),
    ('20040101'),
    ('200401')
) AS text;
{noformat}
But right now, this won't give me any good results on the partial dates:
||text_date||ts_date||
|2001-01-01|2001-01-01 00:00:00|
|*2002-01*|*NULL*|
|*2003*|*NULL*|
|"2001-02-01"|2001-02-01 00:00:00|
|*"2002-02"*|*NULL*|
|*"2003"*|*NULL*|
|1111-2022-01-05|NULL|
|20040101|NULL|
|200401|NULL|

> conversion of partial date strings to timestamp
> -----------------------------------------------
>
>                 Key: IMPALA-12691
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12691
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Gabor Zele
>            Priority: Major
>
> Currently the TO_TIMESTAMP(STRING date, STRING pattern) function will only 
> convert a string to timestamp if it has year, month and day parts.
> If I'd like to convert a single year or year/month combination:
> {noformat}
> SELECT
>     TO_TIMESTAMP('2024-01','yyyy-MM'),
>     TO_TIMESTAMP('2024','yyyy')
> ;{noformat}
> I always get NULL as result.
> To be able to convert these values, especially if there string contains 
> something else as well, requires regexp_extract() and some other functions to 
> match the partial format and expand it with "01" as default values for 
> day/month so CAST/to_timestamp() will work eventually.
> If Impala could do the same based on a partial string pattern, then these 
> kind of conversions would be much simpler and maintainable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to