[
https://issues.apache.org/jira/browse/IMPALA-12691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804512#comment-17804512
]
Gabor Zele edited comment on IMPALA-12691 at 1/8/24 11:22 PM:
--------------------------------------------------------------
Just to explain a bit more, I need the following query to convert some full and
partial dates stored in strings to timestamps:
{noformat}
SELECT
text_date,
CAST(
CONCAT_WS('-',
if(regexp_extract(text_date,'(?:[^\\w-]|^)(\\d{4})(?:-(\\d{2}))?(?:-(\\d{2}))?(?:[^\\w-]|$)',1)="",NULL,
regexp_extract(text_date,'(?:[^\\w-]|^)(\\d{4})(?:-(\\d{2}))?(?:-(\\d{2}))?(?:[^\\w-]|$)',1)),
if(regexp_extract(text_date,'(?:[^\\w-]|^)(\\d{4})(?:-(\\d{2}))?(?:-(\\d{2}))?(?:[^\\w-]|$)',2)="","01",
regexp_extract(text_date,'(?:[^\\w-]|^)(\\d{4})(?:-(\\d{2}))?(?:-(\\d{2}))?(?:[^\\w-]|$)',2)),
if(regexp_extract(text_date,'(?:[^\\w-]|^)(\\d{4})(?:-(\\d{2}))?(?:-(\\d{2}))?(?:[^\\w-]|$)',3)="","01",
regexp_extract(text_date,'(?:[^\\w-]|^)(\\d{4})(?:-(\\d{2}))?(?:-(\\d{2}))?(?:[^\\w-]|$)',3))
)
AS TIMESTAMP)
FROM ( VALUES
('2001-01-01' as text_date),
('2002-01'),
('2003'),
('"2001-02-01"'),
('"2002-02"'),
('"2003"'),
('1111-2022-01-05'),
('20040101'),
('200401')
) AS text;{noformat}
Result:
||text_date||ts_date||
|2001-01-01|2001-01-01 00:00:00|
|2002-01|2002-01-01 00:00:00|
|2003|2003-01-01 00:00:00|
|"2001-02-01"|2001-02-01 00:00:00|
|"2002-02"|2002-02-01 00:00:00|
|"2003"|2003-01-01 00:00:00|
|1111-2022-01-05|NULL|
|20040101|NULL|
|200401|NULL|
While this query works well, it is quite complex, which makes it hard to
understand, modifications more prone to errors.
I would prefer if I could do this:
{noformat}
SELECT
text_date,
CASE
WHEN TO_TIMESTAMP(text_date, 'yyyy-MM-dd') IS NOT NULL THEN
TO_TIMESTAMP(text_date, 'yyyy-MM-dd')
WHEN TO_TIMESTAMP(text_date, '"yyyy-MM-dd"') IS NOT NULL THEN
TO_TIMESTAMP(text_date, '"yyyy-MM-dd"')
WHEN TO_TIMESTAMP(text_date, 'yyyy-MM') IS NOT NULL THEN
TO_TIMESTAMP(text_date, 'yyyy-MM')
WHEN TO_TIMESTAMP(text_date, '"yyyy-MM"') IS NOT NULL THEN
TO_TIMESTAMP(text_date, '"yyyy-MM"')
WHEN TO_TIMESTAMP(text_date, 'yyyy') IS NOT NULL THEN
TO_TIMESTAMP(text_date, 'yyyy')
WHEN TO_TIMESTAMP(text_date, '"yyyy"') IS NOT NULL THEN
TO_TIMESTAMP(text_date, '"yyyy"')
ELSE NULL
END as ts_date
FROM ( VALUES
('2001-01-01' as text_date),
('2002-01'),
('2003'),
('"2001-02-01"'),
('"2002-02"'),
('"2003"'),
('1111-2022-01-05'),
('20040101'),
('200401')
) AS text;
{noformat}
But right now, this won't give me any good results on the partial dates:
||text_date||ts_date||
|2001-01-01|2001-01-01 00:00:00|
|*2002-01*|*NULL*|
|*2003*|*NULL*|
|"2001-02-01"|2001-02-01 00:00:00|
|*"2002-02"*|*NULL*|
|*"2003"*|*NULL*|
|1111-2022-01-05|NULL|
|20040101|NULL|
|200401|NULL|
was (Author: zegab):
Just to explain a bit more, I need the following query to convert some full and
partial dates stored in strings to timestamps:
{noformat}
SELECT
text_date,
CAST(
CONCAT_WS('-',
if(regexp_extract(text_date,'(?:[^\\w-]|^)(\\d{4})(?:-(\\d{2}))?(?:-(\\d{2}))?(?:[^\\w-]|$)',1)="",NULL,
regexp_extract(text_date,'(?:[^\\w-]|^)(\\d{4})(?:-(\\d{2}))?(?:-(\\d{2}))?(?:[^\\w-]|$)',1)),
if(regexp_extract(text_date,'(?:[^\\w-]|^)(\\d{4})(?:-(\\d{2}))?(?:-(\\d{2}))?(?:[^\\w-]|$)',2)="","01",
regexp_extract(text_date,'(?:[^\\w-]|^)(\\d{4})(?:-(\\d{2}))?(?:-(\\d{2}))?(?:[^\\w-]|$)',2)),
if(regexp_extract(text_date,'(?:[^\\w-]|^)(\\d{4})(?:-(\\d{2}))?(?:-(\\d{2}))?(?:[^\\w-]|$)',3)="","01",
regexp_extract(text_date,'(?:[^\\w-]|^)(\\d{4})(?:-(\\d{2}))?(?:-(\\d{2}))?(?:[^\\w-]|$)',3))
)
AS TIMESTAMP)FROM ( VALUES
('2001-01-01' as text_date),
('2002-01'),
('2003'),
('"2001-02-01"'),
('"2002-02"'),
('"2003"'),
('1111-2022-01-05'),
('20040101'),
('200401')
) AS text;{noformat}
Result:
||text_date||ts_date||
|2001-01-01|2001-01-01 00:00:00|
|2002-01|2002-01-01 00:00:00|
|2003|2003-01-01 00:00:00|
|"2001-02-01"|2001-02-01 00:00:00|
|"2002-02"|2002-02-01 00:00:00|
|"2003"|2003-01-01 00:00:00|
|1111-2022-01-05|NULL|
|20040101|NULL|
|200401|NULL|
While this query works well, it is quite complex, which makes it hard to
understand, modifications more prone to errors.
I would prefer if I could do this:
{noformat}
SELECT
text_date,
CASE
WHEN TO_TIMESTAMP(text_date, 'yyyy-MM-dd') IS NOT NULL THEN
TO_TIMESTAMP(text_date, 'yyyy-MM-dd')
WHEN TO_TIMESTAMP(text_date, '"yyyy-MM-dd"') IS NOT NULL THEN
TO_TIMESTAMP(text_date, '"yyyy-MM-dd"')
WHEN TO_TIMESTAMP(text_date, 'yyyy-MM') IS NOT NULL THEN
TO_TIMESTAMP(text_date, 'yyyy-MM')
WHEN TO_TIMESTAMP(text_date, '"yyyy-MM"') IS NOT NULL THEN
TO_TIMESTAMP(text_date, '"yyyy-MM"')
WHEN TO_TIMESTAMP(text_date, 'yyyy') IS NOT NULL THEN
TO_TIMESTAMP(text_date, 'yyyy')
WHEN TO_TIMESTAMP(text_date, '"yyyy"') IS NOT NULL THEN
TO_TIMESTAMP(text_date, '"yyyy"')
ELSE NULL
END as ts_date
FROM ( VALUES
('2001-01-01' as text_date),
('2002-01'),
('2003'),
('"2001-02-01"'),
('"2002-02"'),
('"2003"'),
('1111-2022-01-05'),
('20040101'),
('200401')
) AS text;
{noformat}
But right now, this won't give me any good results on the partial dates:
||text_date||ts_date||
|2001-01-01|2001-01-01 00:00:00|
|*2002-01*|*NULL*|
|*2003*|*NULL*|
|"2001-02-01"|2001-02-01 00:00:00|
|*"2002-02"*|*NULL*|
|*"2003"*|*NULL*|
|1111-2022-01-05|NULL|
|20040101|NULL|
|200401|NULL|
> conversion of partial date strings to timestamp
> -----------------------------------------------
>
> Key: IMPALA-12691
> URL: https://issues.apache.org/jira/browse/IMPALA-12691
> Project: IMPALA
> Issue Type: Improvement
> Reporter: Gabor Zele
> Priority: Major
>
> Currently the TO_TIMESTAMP(STRING date, STRING pattern) function will only
> convert a string to timestamp if it has year, month and day parts.
> If I'd like to convert a single year or year/month combination:
> {noformat}
> SELECT
> TO_TIMESTAMP('2024-01','yyyy-MM'),
> TO_TIMESTAMP('2024','yyyy')
> ;{noformat}
> I always get NULL as result.
> To be able to convert these values, especially if there string contains
> something else as well, requires regexp_extract() and some other functions to
> match the partial format and expand it with "01" as default values for
> day/month so CAST/to_timestamp() will work eventually.
> If Impala could do the same based on a partial string pattern, then these
> kind of conversions would be much simpler and maintainable.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]