There's another issue that hopefully Hyunsik would be able to clarify, and it's very crucial to handling timezones in these data types.
Q: So, let's say (and I agree) TimeDatum represents an instant, so can be timezoned. Then, is it a UTC time or a local time? Let me explain why this question is important. DateDatum represents an instant. And, it is implicitly timezoned to the user local time. Which means, if I use '11/20/2014', it's '11/20/2014 0:00:00 PST' and is equivalent to a TimestampDatum for '11/20/2014 8:00:00 UTC'. (BTW, the compareTo implementation for DateDatum is broken in this regard. I'll file a separate issue for that.) What about TimeDatum? It is currently timezoned to UTC, (surprise, anyone?) if I understood the code correctly. When we add a TimeDatum to a DateDatum, we convert the TimeDatum to the user local time, which implies TimeDatum is UTC-based. (Also, the comment next to the line explicitly mentions it.) So, here's the problem. Why do they have different timezones? They're incomplete as an instant when used alone and are complementary to each other. This is an important concept. To understand it, you have to think about why adding DateDatum and TimeDatum is allowed in the code. Originally, instants can't be added. (And, that's why I thought TimeDatum is not an instant.) You can't add (say) 11/20/2014 to 11/23/2019. Subtracting an instant from an instant makes sense and results in a period, but they can't be added. However, in the case of DateDatum and TimeDatum, additions are allowed because they're complementary to each other, and what the code does **conceptually** is concatenate the two. Therefore, because they're intended to be used together, I'd argue they shouldn't have different timezones. Also, if they have different timezones, additions can't have a simple correct answer. What's the correct answer of 11/20/2014 (PST) + 8:00:00 (UTC)? There's no clear answer because they can't be simply concatenated. (FWIW, the current Tajo code thinks the answer is 11/20/2014 8:00:00 in UTC. How many of you got it right?) This can cause a lot of confusion to users. When they use a date alone, it is interpreted as a local time. But, as soon as they add a time to it, it is silently converted to UTC in a way which is very unexpected to many users. Why am I emphasizing it is unexpected? Look at the comparison below. What's the answer to 11/20/2014 (DateDatum) + 8 hours (IntervalDatum)? It's 11/20/2014 8:00 in a local time (PST on my machine). How about 11/20/2014 (DateDatum) + 8:00:00 (TimeDatum)? It's 11/20/2014 8:00 in UTC as I wrote above. How many users would be able to expect this? So, coming back to my original question. What timezone should a TimeDatum have? UTC or local time? It's currently UTC. But, I believe it should be changed to the local time zone. (Sorry for the long email. But, I think it's critical to get this right and build consensus over it so that we can provide a consistent behavior going forward. The actual fix will be really simple, though.) On Thu, Nov 20, 2014 at 2:44 AM, Hyunsik Choi <[email protected]> wrote: > Those parts have poor documentation. > > I agree with your investigation. I also could find many misuse of timezone > in many parts. We should make them clear and fix them in this chance. > > I just got off the plane, and I'm still on the road. So, I'll give more > comments tomorrow. > > - hyunsik > > On Thu, Nov 20, 2014 at 5:24 PM, Jaewoong Jung <[email protected]> wrote: > >> Wow, this seemingly trivial issue has surprisingly many problems involved. >> >> The most critical one, though, is TimeMeta class. Presumably because >> it is poorly documented, it is being used for two different purposes >> in Tajo code base. Some code treats it as a date time representation, >> which I believe is the original intention, but some treat it as the >> human-readable equivalent of TimeDatum by completely ignoring >> date-related fields. >> >> For example, DateTimeUtil.date2j(long julianDate, TimeMeta tm), which >> converts a julian timestamp to a TimeMeta value, doesn't touch >> dayOfMonth, monthOfYear, or years values and just puts all values for >> hour and above units in hours field. >> >> There are other minor problems like incorrect comments and absent >> default values, but the most critical one is misuse of TimeMeta. >> >> I'll try to break up my patches so that each has a clear and >> easy-to-understand goal. Sending this heads-up to let you know >> there'll be more issues filed and patches sent than you might expect. >> >> On Wed, Nov 19, 2014 at 10:56 PM, Jaewoong Jung <[email protected]> wrote: >> > Yeah, after some more research, I found that TImeDatum is a somewhat >> > ambiguous data type. >> > >> > Its original purpose is to represent a time of a day, i.e. hh:mm:ss >> > part of an instant. So, it could be viewed as instant data though some >> > may argue it's incomplete to fully represent an instant. Anyway, given >> > that TimestampDatum has limitation in terms of the time range it can >> > represent, and given that Tajo doesn't have DateTime data type, >> > clients should be allowed to use it with a timezone. >> > >> > I'll change the direction and try to address the issue by fixing the >> > underflow error. >> > >> > Thanks for the input. :) >> > >> > On Wed, Nov 19, 2014 at 10:24 PM, Jong-young Park <[email protected]> >> wrote: >> >> Hi, Jaewoong. >> >> >> >> To express time period value, IntervalDatum is existing as I know. >> >> >> >> So I think it is right that TimeDatum is for some time point. >> >> >> >> And TimestampDatum seems it is doing both roles of DateDatum and >> TimeDatum. >> >> >> >> Regards, >> >> Jongyoung. >> >> >> >> On Wed Nov 19 2014 at 오후 5:23:40 Jaewoong Jung <[email protected]> >> wrote: >> >> >> >>> It turns out, TAJO-1191 is slightly more complicated than I thought. >> >>> (https://issues.apache.org/jira/browse/TAJO-1191) >> >>> >> >>> Basically, it's about whether TimeDatum may have a timezone tied with >> >>> it. I **believe** TimeDatum is originally designed to hold a time >> >>> period value, not an instant. (TimestampDatum seems to be the >> >>> canonical container for instants.) So, it doesn't make sense to apply >> >>> any timezones to TimeDatum values, but it's being done in a few >> >>> places. And, that's why the test is failing on my machines. >> >>> >> >>> I'm going to try to fix it by removing all timezone-related references >> >>> around the class, but I want to check my assumption with you before I >> >>> proceed. >> >>> >> >>> What do you think about it? >> >>> >>
