[ 
https://issues.apache.org/jira/browse/CALCITE-6624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17890424#comment-17890424
 ] 

Dmitry Sysolyatin edited comment on CALCITE-6624 at 10/17/24 11:43 AM:
-----------------------------------------------------------------------

I did some further investigation into the issue with parsing literals and data 
types. Although this task isn't directly related to parsing literals, it might 
be useful to consider if Calcite should eventually handle other custom data 
type literals in a similar way to how it does for BigQuery.
 # MySQL does not support defining {{{}DATETIME{}}}` literals with syntax like 
{{SELECT DATETIME '2020-01-01 10:10:10'}} [1]. The test I wrote initially was 
based on the task comment CALCITE-6177, but it is incorrect for MySQL syntax. 
The correct syntax should be: {{{}SELECT CAST(CAST(TIMESTAMP '1000-01-01 
00:00:00' AS DATETIME) AS DATETIME){}}}. MySQL's SQL syntax is quite messy. 
They have both {{TIMESTAMP}} and {{DATETIME}} types, but when someone uses 
{{{}TIMESTAMP '1000-01-01 00:00:00'{}}}, it results in a {{DATETIME}} type [1].
??The TIMESTAMP syntax produces a DATETIME value in MySQL because DATETIME has 
a range that more closely corresponds to the standard SQL TIMESTAMP type, which 
has a year range from 0001 to 9999. (The MySQL TIMESTAMP year range is 1970 to 
2038.)??
 # I tried to find other databases that have custom data types and allow 
defining literals like {{<TYPENAME> <TYPEVALUE>}} for custom types. It seems 
only BigQuery and PostgreSQL [2] do this. Other databases, use functions 
approach such as {{toDateTime64(<value>)}} (Clickhouse), 
{{XMLType('xml_string')}} (Oracle) or just plain CAST.
If In BigQuery case, it's just {{{}DATETIME{}}}, then PostgreSQL has many 
custom types, like {{{}float8, float4, timestamptz, inet, cidr{}}}, and more. 
I’m not sure if it’s a good idea to define a keyword for each data type in 
`Parser.jj`. I think the PostgreSQL implementation should define custom type 
names as `nonReservedKeywordsToAdd` in the `config.fmpp` file of the Babel 
modules (under the `keywords` section) and add parsing for such literals in the 
`literalParserMethods` section. Probably the same can be valid for BigQuery 
literal as well.

 

 

Regarding parsing custom type names, as [~julianhyde] mentioned. Yes, I think 
it is better approach. There is a generic way to parse custom type names using 
`CompoundIdentifier`:
{code:java}
        typeName = CompoundIdentifier() {
            typeNameSpec = new SqlUserDefinedTypeNameSpec(typeName, 
s.end(this));
        }
{code}
However, this doesn’t work for `DATETIME` because it’s marked as a reserved 
keyword. There are two options to fix this:
 # Add <DATETIME> to nonReservedKeywordsToAdd section of babel config.fmpp
 # Add DATETIME data type name parsing inside dataTypeParserMethods section

I prefer the first approach.

[1] [https://dev.mysql.com/doc/refman/8.4/en/date-and-time-literals.html]
[2] 
[https://www.postgresql.org/docs/15/sql-syntax-lexical.html#SQL-SYNTAX-CONSTANTS-GENERIC]


was (Author: dmsysolyatin):
I did some further investigation into the issue with parsing literals and data 
types. Although this task isn't directly related to parsing literals, it might 
be useful to consider if Calcite should eventually handle other custom data 
type literals in a similar way to how it does for BigQuery.
 # MySQL does not support defining {{{}DATETIME{}}}` literals with syntax like 
{{SELECT DATETIME '2020-01-01 10:10:10'}} [1]. The test I wrote initially was 
based on the task comment CALCITE-6177, but it is incorrect for MySQL syntax. 
The correct syntax should be: {{{}SELECT CAST(CAST(TIMESTAMP '1000-01-01 
00:00:00' AS DATETIME) AS DATETIME){}}}. MySQL's SQL syntax is quite messy. 
They have both {{TIMESTAMP}} and {{DATETIME}} types, but when someone uses 
{{{}TIMESTAMP '1000-01-01 00:00:00'{}}}, it results in a {{DATETIME}} type [1].
??The TIMESTAMP syntax produces a DATETIME value in MySQL because DATETIME has 
a range that more closely corresponds to the standard SQL TIMESTAMP type, which 
has a year range from 0001 to 9999. (The MySQL TIMESTAMP year range is 1970 to 
2038.)??
 # I tried to find other databases that have custom data types and allow 
defining literals like {{<TYPENAME> <TYPEVALUE>}} for custom types. It seems 
only BigQuery and PostgreSQL [2] do this. Other databases, use functions 
approach such as {{toDateTime64(<value>)}} (Clickhouse), 
{{XMLType('xml_string')}} (Oracle) or just plain CAST.
If In BigQuery case, it's just {{{}DATETIME{}}}, then PostgreSQL has many 
custom types, like {{{}float8, float4, timestamptz, inet, cidr{}}}, and more. 
I’m not sure if it’s a good idea to define a keyword for each data type in 
`Parser.jj`. I think the PostgreSQL implementation should define custom type 
names as `nonReservedKeywordsToAdd` in the `config.fmpp` file of the Babel 
modules (under the `keywords` section) and add parsing for such literals in the 
`literalParserMethods` section. Probably the same can be valid for BigQuery 
literal as well.

Regarding parsing custom type names, as [~julianhyde] mentioned. Yes, I think 
it is better approach. There is a generic way to parse custom type names using 
`CompoundIdentifier`:
{code:java}
        typeName = CompoundIdentifier() {
            typeNameSpec = new SqlUserDefinedTypeNameSpec(typeName, 
s.end(this));
        }
{code}
However, this doesn’t work for `DATETIME` because it’s marked as a reserved 
keyword. There are two options to fix this:
 # Add <DATETIME> to nonReservedKeywordsToAdd section of babel config.fmpp
 # Add DATETIME data type name parsing inside dataTypeParserMethods section

I prefer the first approach.

[1] [https://dev.mysql.com/doc/refman/8.4/en/date-and-time-literals.html]
[2] 
[https://www.postgresql.org/docs/15/sql-syntax-lexical.html#SQL-SYNTAX-CONSTANTS-GENERIC]

> SqlParser should parse MySQL DATETIME type
> ------------------------------------------
>
>                 Key: CALCITE-6624
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6624
>             Project: Calcite
>          Issue Type: Bug
>          Components: babel, core
>    Affects Versions: 1.37.0
>            Reporter: Dmitry Sysolyatin
>            Assignee: Dmitry Sysolyatin
>            Priority: Major
>              Labels: pull-request-available
>
> MySQL has two different data types: TIMESTAMP and DATETIME. The difference 
> between them is the range they support.
> From the documentation [1]
> ??The TIMESTAMP data type is used for values that contain both date and time 
> parts. TIMESTAMP has a range of '1970-01-01 00:00:01' UTC to '2038-01-19 
> 03:14:07' UTC.??
> ??The DATETIME type is used for values that contain both date and time parts. 
> MySQL retrieves and displays DATETIME values in 'YYYY-MM-DD hh:mm:ss' format. 
> The supported range is '1000-01-01 00:00:00' to '9999-12-31 23:59:59'.??
> Calcite's TIMESTAMP likely supports both ranges, and for unparse logic, the 
> MySQL dialect class always uses DATETIME because the TIMESTAMP range is a 
> subset of the DATETIME range.
> The only missing part is parsing the DATETIME datatype. For example
> {code:java}
> SELECT CAST(timestamp_field AS DATETIME) FROM <table>
> {code}
> [1] [https://dev.mysql.com/doc/refman/8.4/en/datetime.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to