Hi, All Thanks for your comments. I think all of the thread have agreed that: (1) The return values of CURRENT_TIME/CURRENT_TIMESTAMP/NOW()/PROCTIME() are wrong. (2) The LOCALTIME/LOCALTIMESTAMP and CURRENT_TIME/CURRENT_TIMESTAMP should be different whether from SQL standard’s perspective or mature systems. (3) The semantics of three TIMESTAMP types in Flink SQL follows the SQL standard and also keeps the same with other 'good' vendors. TIMESTAMP => A literal in ‘yyyy-MM-dd HH:mm:ss’ format to describe a time, does not contain timezone info, can not represent an absolute time point. TIMESTAMP WITH LOCAL ZONE => Records the elapsed time from absolute time point origin, can represent an absolute time point, requires local time zone when expressed with ‘yyyy-MM-dd HH:mm:ss’ format. TIMESTAMP WITH TIME ZONE => Consists of time zone info and a literal in ‘yyyy-MM-dd HH:mm:ss’ format to describe time, can represent an absolute time point.
Currently we've two ways to correct CURRENT_TIME/CURRENT_TIMESTAMP/NOW()/PROCTIME(). option (1): As the FLIP proposed, change the return value from UTC timezone to local timezone. Pros: (1) The change looks smaller to users and developers (2) There're many SQL engines adopted this way Cons: (1) connector devs may confuse the underlying value of TimestampData which needs to change according to data type (2) I thought about this weekend. Unfortunately I found a bad case: The proposal is fine if we only use it in FLINK SQL world, but we need to consider the conversion between Table/DataStream, assume a record produced in UTC+0 timezone with TIMESTAMP '1970-01-01 08:00:44' and the Flink SQL processes the data with session time zone 'UTC+8', if the sql program need to convert the Table to DataStream, then we need to calculate the timestamp in StreamRecord with session time zone (UTC+8), then we will get 44 in DataStream program, but it is wrong because the expected value should be (8 * 60 * 60 + 44). The corner case tell us that the ROWTIME/PROCTIME in Flink are based on UTC+0, when correct the PROCTIME() function, the better way is to use TIMESTAMP WITH LOCAL TIME ZONE which keeps same long value with time based on UTC+0 and can be expressed with local timezone. option (2) : As we considered in the FLIP as well as @Timo suggested, change the return type to TIMESTAMP WITH LOCAL TIME ZONE, the expressed value depends on the local time zone. Pros: (1) Make Flink SQL more close to SQL standard (2) Can deal the conversion between Table/DataStream well Cons: (1) We need to discuss the return value/type of CURRENT_TIME function (2) The change is bigger to users, we need to support TIMESTAMP WITH LOCAL TIME ZONE in connectors/formats as well as custom connectors. (3)The TIMESTAMP WITH LOCAL TIME ZONE support is weak in Flink, thus we need some improvement,but the workload does not matter as long as we are doing the right thing ^_^ Due to the above bad case for option (1). I think option 2 should be adopted, But we also need to consider some problems: (1) More conversion classes like LocalDateTime, sql.Timestamp should be supported for LocalZonedTimestampType to resolve the UDF compatibility issue (2) The timezone offset for window size of one day should still be considered (3) All connectors/formats should supports TIMESTAMP WITH LOCAL TIME ZONE well and we also should record in document I’ll update these sections of FLIP-162. We also need to discuss the CURRENT_TIME function. I know the standard way is using TIME WITH TIME ZONE(there's no TIME WITH LOCAL TIME ZONE), but we don't support this type yet and I don't see strong motivation to support it so far. Compared to CURRENT_TIMESTAMP, the CURRENT_TIME can not represent an absolute time point which should be considered as a string consisting of a time with 'HH:mm:ss' format and time zone info. We have several options for this: (1) We can forbid CURRENT_TIME as @Timo proposed to make all Flink SQL functions follow the standard well, in this way, we need to offer some guidance for user upgrading Flink versions. (2) We can also support it from a user's perspective who has used CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP, btw,Snowflake also returns TIME type. (3) Returns TIMESTAMP WITH LOCAL TIME ZONE to make it equal to CURRENT_TIMESTAMP as Calcite did. I can image (1) which we don't want to left a bad smell in Flink SQL, and I also accept (2) because I think users do not consider time zone issues when they use CURRENT_DATE/CURRENT_TIME, and the timezone info in time is not very useful. I don’t have a strong opinion for them. What do others think? I hope I've addressed your concerns. @Timo @Kurt Best, Leonard > Most of the mature systems have a clear difference between CURRENT_TIMESTAMP > and LOCALTIMESTAMP. I wouldn't take Spark or Hive as a good example. > Snowflake decided for TIMESTAMP WITH LOCAL TIME ZONE. As I mentioned in the > last comment, I could also imagine this behavior for Flink. But in any case, > there should be some time zone information considered in order to cast to all > other types. > > >>> The function CURRENT_DATE/CURRENT_TIME is supporting in SQL standard, but > >>> LOCALDATE not, I don’t think it’s a good idea that dropping functions > >>> which > >>> SQL standard supported and introducing a replacement which SQL standard > >>> not > >>> reminded. > > We can still add those functions in the future. But since we don't offer a > TIME WITH TIME ZONE, it is better to not support this function at all for > now. And by the way, this is exactly the behavior that also Microsoft SQL > Server does: it also just supports CURRENT_TIMESTAMP (but it returns > TIMESTAMP without a zone which completes the confusion). > > >>> I also agree returning TIMESTAMP WITH LOCAL TIME ZONE for PROCTIME has > >>> more clear semantics, but I realized that user didn’t care the type but > >>> more about the expressed value they saw, and change the type from > >>> TIMESTAMP > >>> to TIMESTAMP WITH LOCAL TIME ZONE brings huge refactor that we need > >>> consider all places where the TIMESTAMP type used > > From a UDF perspective, I think nothing will change. The new type system and > type inference were designed to support all these cases. There is a reason > why Java has adopted Joda time, because it is hard to come up with a good > time library. That's why also we and the other Hadoop ecosystem folks have > decided for 3 different kinds of LocalDateTime, ZonedDateTime, and Instance. > It makes the library more complex, but time is a complex topic. > > I also doubt that many users work with only one time zone. Take the US as an > example, a country with 3 different timezones. Somebody working with US data > cannot properly see the data points with just LOCAL TIME ZONE. But on the > other hand, a lot of event data is stored using a UTC timestamp. > > > >> Before jumping into technique details, let's take a step back to discuss > >> user experience. > >> > >> The first important question is what kind of date and time will Flink > >> display when users call > >> CURRENT_TIMESTAMP and maybe also PROCTIME (if we think they are similar). > >> > >> Should it always display the date and time in UTC or in the user's time > >> zone? > > @Kurt: I think we all agree that the current behavior with just showing UTC > is wrong. Also, we all agree that when calling CURRENT_TIMESTAMP or PROCTIME > a user would like to see the time in it's current time zone. > > As you said, "my wall clock time". > > However, the question is what is the data type of what you "see". If you pass > this record on to a different system, operator, or different cluster, should > the "my" get lost or materialized into the record? > > TIMESTAMP -> completely lost and could cause confusion in a different system > > TIMESTAMP WITH LOCAL TIME ZONE -> at least the UTC is correct, so you can > provide a new local time zone > > TIMESTAMP WITH TIME ZONE -> also "your" location is persisted > > Regards, > Timo > > > > > On 22.01.21 09:38, Kurt Young wrote: >> Forgot one more thing. Continue with displaying in UTC. As a user, if Flink >> want to display the timestamp >> in UTC, why don't we offer something like UTC_TIMESTAMP? >> Best, >> Kurt >> On Fri, Jan 22, 2021 at 4:33 PM Kurt Young <ykt...@gmail.com> wrote: >>> Before jumping into technique details, let's take a step back to discuss >>> user experience. >>> >>> The first important question is what kind of date and time will Flink >>> display when users call >>> CURRENT_TIMESTAMP and maybe also PROCTIME (if we think they are similar). >>> >>> Should it always display the date and time in UTC or in the user's time >>> zone? I think this part is the >>> reason that surprised lots of users. If we forget about the type and >>> internal representation of these >>> two methods, as a user, my instinct tells me that these two methods should >>> display my wall clock time. >>> >>> Display time in UTC? I'm not sure, why I should care about UTC time? I >>> want to get my current timestamp. >>> For those users who have never gone abroad, they might not even be able to >>> realize that this is affected >>> by the time zone. >>> >>> Best, >>> Kurt >>> >>> >>> On Fri, Jan 22, 2021 at 12:25 PM Leonard Xu <xbjt...@gmail.com> wrote: >>> >>>> Thanks @Timo for the detailed reply, let's go on this topic on this >>>> discussion, I've merged all mails to this discussion. >>>> >>>>> LOCALDATE / LOCALTIME / LOCALTIMESTAMP >>>>> >>>>> --> uses session time zone, returns DATE/TIME/TIMESTAMP >>>> >>>>> >>>>> CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP >>>>> >>>>> --> uses session time zone, returns DATE/TIME/TIMESTAMP >>>>> >>>>> I'm very sceptical about this behavior. Almost all mature systems >>>> (Oracle, Postgres) and new high quality systems (Presto, Snowflake) use a >>>> data type with some degree of time zone information encoded. In a >>>> globalized world with businesses spanning different regions, I think we >>>> should do this as well. There should be a difference between >>>> CURRENT_TIMESTAMP and LOCALTIMESTAMP. And users should be able to choose >>>> which behavior they prefer for their pipeline. >>>> >>>> >>>> I know that the two series should be different at first glance, but >>>> different SQL engines can have their own explanations,for example, >>>> CURRENT_TIMESTAMP and LOCALTIMESTAMP are synonyms in Snowflake[1] and has >>>> no difference, and Spark only supports the later one and doesn’t support >>>> LOCALTIME/LOCALTIMESTAMP[2]. >>>> >>>> >>>>> If we would design this from scatch, I would suggest the following: >>>>> >>>>> - drop CURRENT_DATE / CURRENT_TIME and let users pick LOCALDATE / >>>> LOCALTIME for materialized timestamp parts >>>> >>>> The function CURRENT_DATE/CURRENT_TIME is supporting in SQL standard, but >>>> LOCALDATE not, I don’t think it’s a good idea that dropping functions which >>>> SQL standard supported and introducing a replacement which SQL standard not >>>> reminded. >>>> >>>> >>>>> - CURRENT_TIMESTAMP should return a TIMESTAMP WITH TIME ZONE to >>>> materialize all session time information into every record. It it the most >>>> generic data type and allows to cast to all other timestamp data types. >>>> This generic ability can be used for filter predicates as well either >>>> through implicit or explicit casting. >>>> >>>> TIMESTAMP WITH TIME ZONE indeed contains more information to describe a >>>> time point, but the type TIMESTAMP can cast to all other timestamp data >>>> types combining with session time zone as well, and it also can be used for >>>> filter predicates. For type casting between BIGINT and TIMESTAMP, I think >>>> the function way using TO_TIMEMTAMP()/FROM_UNIXTIMESTAMP() is more clear. >>>> >>>>> PROCTIME/ROWTIME should be time functions based on a long value. Both >>>> System.currentMillis() and our watermark system work on long values. Those >>>> should return TIMESTAMP WITH LOCAL TIME ZONE because the main calculation >>>> should always happen based on UTC. >>>>> We discussed it in a different thread, but we should allow PROCTIME >>>> globally. People need a way to create instances of TIMESTAMP WITH LOCAL >>>> TIME ZONE. This is not considered in the current design doc. >>>>> Many pipelines contain UTC timestamps and thus it should be easy to >>>> create one. >>>>> Also, both CURRENT_TIMESTAMP and LOCALTIMESTAMP can work with this type >>>> because we should remember that TIMESTAMP WITH LOCAL TIME ZONE accepts all >>>> timestamp data types as casting target [1]. We could allow TIMESTAMP WITH >>>> TIME ZONE in the future for ROWTIME. >>>>> In any case, windows should simply adapt their behavior to the passed >>>> timestamp type. And with TIMESTAMP WITH LOCAL TIME ZONE a day is defined by >>>> considering the current session time zone. >>>> >>>> I also agree returning TIMESTAMP WITH LOCAL TIME ZONE for PROCTIME has >>>> more clear semantics, but I realized that user didn’t care the type but >>>> more about the expressed value they saw, and change the type from TIMESTAMP >>>> to TIMESTAMP WITH LOCAL TIME ZONE brings huge refactor that we need >>>> consider all places where the TIMESTAMP type used, and many builtin >>>> functions and UDFs doest not support TIMESTAMP WITH LOCAL TIME ZONE type. >>>> That means both user and Flink devs need to refactor the code(UDF, builtin >>>> functions, sql pipeline), to be honest, I didn’t see strong motivation that >>>> we have to do the pretty big refactor from user’s perspective and >>>> developer’s perspective. >>>> >>>> In one word, both your suggestion and my proposal can resolve almost all >>>> user problems,the divergence is whether we need to spend pretty energy just >>>> to get a bit more accurate semantics? I think we need a tradeoff. >>>> >>>> >>>> Best, >>>> Leonard >>>> [1] >>>> https://trino.io/docs/current/functions/datetime.html#current_timestamp < >>>> https://trino.io/docs/current/functions/datetime.html#current_timestamp> >>>> [2] https://issues.apache.org/jira/browse/SPARK-30374 < >>>> https://issues.apache.org/jira/browse/SPARK-30374> >>>> >>>> >>>> >>>> >>>> >>>> >>>>> 2021-01-22,00:53,Timo Walther <twal...@apache.org> : >>>>> >>>>> Hi Leonard, >>>>> >>>>> thanks for working on this topic. I agree that time handling is not >>>> easy in Flink at the moment. We added new time data types (and some are >>>> still not supported which even further complicates things like TIME(9)). We >>>> should definitely improve this situation for users. >>>>> >>>>> This is a pretty opinionated topic and it seems that the SQL standard >>>> is not really deciding this but is at least supporting. So let me express >>>> my opinion for the most important functions: >>>>> >>>>> LOCALDATE / LOCALTIME / LOCALTIMESTAMP >>>>> >>>>> --> uses session time zone, returns DATE/TIME/TIMESTAMP >>>>> >>>>> I think those are the most obvious ones because the LOCAL indicates >>>> that the locality should be materialized into the result and any time zone >>>> information (coming from session config or data) is not important >>>> afterwards. >>>>> >>>>> CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP >>>>> >>>>> --> uses session time zone, returns DATE/TIME/TIMESTAMP >>>>> >>>>> I'm very sceptical about this behavior. Almost all mature systems >>>> (Oracle, Postgres) and new high quality systems (Presto, Snowflake) use a >>>> data type with some degree of time zone information encoded. In a >>>> globalized world with businesses spanning different regions, I think we >>>> should do this as well. There should be a difference between >>>> CURRENT_TIMESTAMP and LOCALTIMESTAMP. And users should be able to choose >>>> which behavior they prefer for their pipeline. >>>>> >>>>> If we would design this from scatch, I would suggest the following: >>>>> >>>>> - drop CURRENT_DATE / CURRENT_TIME and let users pick LOCALDATE / >>>> LOCALTIME for materialized timestamp parts >>>>> >>>>> - CURRENT_TIMESTAMP should return a TIMESTAMP WITH TIME ZONE to >>>> materialize all session time information into every record. It it the most >>>> generic data type and allows to cast to all other timestamp data types. >>>> This generic ability can be used for filter predicates as well either >>>> through implicit or explicit casting. >>>>> >>>>> PROCTIME/ROWTIME should be time functions based on a long value. Both >>>> System.currentMillis() and our watermark system work on long values. Those >>>> should return TIMESTAMP WITH LOCAL TIME ZONE because the main calculation >>>> should always happen based on UTC. We discussed it in a different thread, >>>> but we should allow PROCTIME globally. People need a way to create >>>> instances of TIMESTAMP WITH LOCAL TIME ZONE. This is not considered in the >>>> current design doc. Many pipelines contain UTC timestamps and thus it >>>> should be easy to create one. Also, both CURRENT_TIMESTAMP and >>>> LOCALTIMESTAMP can work with this type because we should remember that >>>> TIMESTAMP WITH LOCAL TIME ZONE accepts all timestamp data types as casting >>>> target [1]. We could allow TIMESTAMP WITH TIME ZONE in the future for >>>> ROWTIME. >>>>> >>>>> In any case, windows should simply adapt their behavior to the passed >>>> timestamp type. And with TIMESTAMP WITH LOCAL TIME ZONE a day is defined by >>>> considering the current session time zone. >>>>> >>>>> If we would like to design this with less effort required, we could >>>> think about returning TIMESTAMP WITH LOCAL TIME ZONE also for >>>> CURRENT_TIMESTAMP. >>>>> >>>>> >>>>> I will try to involve more people into this discussion. >>>>> >>>>> Thanks, >>>>> Timo >>>>> >>>>> [1] >>>> https://docs.oracle.com/en/database/oracle/oracle-database/21/sqlrf/Data-Types.html#GUID-E7CA339A-2093-4FE4-A36E-1D09593591D3 >>>> < >>>> https://docs.oracle.com/en/database/oracle/oracle-database/21/sqlrf/Data-Types.html#GUID-E7CA339A-2093-4FE4-A36E-1D09593591D3 >>>>> >>>> >>>> >>>> >>>>> 2021-01-21,22:32,Leonard Xu <xbjt...@gmail.com> : >>>>>> Before the changes, as I am writing this reply, the local time here is >>>> 2021-01-21 12:03:35 (Beijing time, UTC+8). >>>>>> And I tried these 5 functions in sql client, and got: >>>>>> >>>>>> Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE, >>>> CURRENT_TIME; >>>>>> >>>> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >>>>>> | EXPR$0 | EXPR$1 | >>>> CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME | >>>>>> >>>> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >>>>>> | 2021-01-21T04:03:35.228 | 2021-01-21T04:03:35.228 | >>>> 2021-01-21T04:03:35.228 | 2021-01-21 | 04:03:35.228 | >>>>>> >>>> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >>>>>> After the changes, the expected behavior will change to: >>>>>> >>>>>> Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE, >>>> CURRENT_TIME; >>>>>> >>>> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >>>>>> | EXPR$0 | EXPR$1 | >>>> CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME | >>>>>> >>>> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >>>>>> | 2021-01-21T12:03:35.228 | 2021-01-21T12:03:35.228 | >>>> 2021-01-21T12:03:35.228 | 2021-01-21 | 12:03:35.228 | >>>>>> >>>> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >>>>>> The return type of now(), proctime() and CURRENT_TIMESTAMP still be >>>> TIMESTAMP; >>>>> >>>>> To Kurt, thanks for the intuitive case, it really clear, you’re wright >>>> that I want to propose to change the return value of these functions. It’s >>>> the most important part of the topic from user's perspective. >>>>> >>>>>> I think this definitely deserves a FLIP. >>>>> To Jark, nice suggestion, I prepared a FLIP for this topic, and will >>>> start the FLIP discussion soon. >>>>> >>>>>>> If use the default Flink SQL, the window time range of the >>>>>>> statistics is incorrect, then the statistical results will naturally >>>> be >>>>>>> incorrect. >>>>> To zhisheng, sorry to hear that this problem influenced your production >>>> jobs, Could you share your SQL pattern? we can have more inputs and try >>>> to resolve them. >>>>> >>>>> >>>>> Best, >>>>> Leonard >>>> >>>> >>>> >>>>> 2021-01-21,14:19,Jark Wu <imj...@gmail.com> : >>>>> >>>>> Great examples to understand the problem and the proposed changes, >>>> @Kurt! >>>>> >>>>> Thanks Leonard for investigating this problem. >>>>> The time-zone problems around time functions and windows have bothered a >>>>> lot of users. It's time to fix them! >>>>> >>>>> The return value changes sound reasonable to me, and keeping the return >>>>> type unchanged will minimize the surprise to the users. >>>>> Besides that, I think it would be better to mention how this affects the >>>>> window behaviors, and the interoperability with DataStream. >>>>> >>>>> I think this definitely deserves a FLIP. >>>>> >>>>> ==================================================== >>>>> >>>>> Hi zhisheng, >>>>> >>>>> Do you have examples to illustrate which case will get the wrong window >>>>> boundaries? >>>>> That will help to verify whether the proposed changes can solve your >>>>> problem. >>>>> >>>>> Best, >>>>> Jark >>>> >>>> >>>> >>>> >>>>> 2021-01-21,12:54,zhisheng <173855...@qq.com> : >>>>> >>>>> Thanks to Leonard Xu for discussing this tricky topic. At present, >>>> there are many Flink jobs in our production environment that are used to >>>> count day-level reports (eg: count PV/UV ). >>>>> >>>>> If use the default Flink SQL, the window time range of the >>>> statistics is incorrect, then the statistical results will naturally be >>>> incorrect. >>>>> >>>>> The user needs to deal with the time zone manually in order to solve >>>> the problem. >>>>> >>>>> If Flink itself can solve these time zone issues, then I think it will >>>> be user-friendly. >>>>> >>>>> Thank you >>>>> >>>>> Best!; >>>>> zhisheng >>>> >>>> >>>> >>>> >>>>> 2021-01-21,12:11,Kurt Young <ykt...@gmail.com> : >>>>> >>>>> cc this to user & user-zh mailing list because this will affect lots of >>>> users, and also quite a lot of users >>>>> were asking questions around this topic. >>>>> >>>>> Let me try to understand this from user's perspective. >>>>> >>>>> Your proposal will affect five functions, which are: >>>>> PROCTIME() >>>>> NOW() >>>>> CURRENT_DATE >>>>> CURRENT_TIME >>>>> CURRENT_TIMESTAMP >>>>> Before the changes, as I am writing this reply, the local time here is >>>> 2021-01-21 12:03:35 (Beijing time, UTC+8). >>>>> And I tried these 5 functions in sql client, and got: >>>>> >>>>> Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE, >>>> CURRENT_TIME; >>>>> >>>> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >>>>> | EXPR$0 | EXPR$1 | >>>> CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME | >>>>> >>>> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >>>>> | 2021-01-21T04:03:35.228 | 2021-01-21T04:03:35.228 | >>>> 2021-01-21T04:03:35.228 | 2021-01-21 | 04:03:35.228 | >>>>> >>>> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >>>>> After the changes, the expected behavior will change to: >>>>> >>>>> Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE, >>>> CURRENT_TIME; >>>>> >>>> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >>>>> | EXPR$0 | EXPR$1 | >>>> CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME | >>>>> >>>> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >>>>> | 2021-01-21T12:03:35.228 | 2021-01-21T12:03:35.228 | >>>> 2021-01-21T12:03:35.228 | 2021-01-21 | 12:03:35.228 | >>>>> >>>> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >>>>> The return type of now(), proctime() and CURRENT_TIMESTAMP still be >>>> TIMESTAMP; >>>>> >>>>> Best, >>>>> Kurt >>>> >>>> >>>> >>>> >>>> >>>> >>>> >