Time zone issues seem to be completely fixed. I also added the documentation. http://tajo.apache.org/docs/devel/time_zone.html
Best regards, Hyunsik On Mon, Dec 8, 2014 at 10:52 AM, Hyunsik Choi <[email protected]> wrote: > Hi Jaewoong and CharSyam, > > I submitted the patch that fixes timezone-related problems discussed > here. I rearranged all time zone usages throughout Tajo. > https://issues.apache.org/jira/browse/TAJO-1234 > > Thanks to the discussion of this mailing list with you guys, I was > clear enough to find or fix the problems. > > Warm regards, > Hyunsik > > On Tue, Nov 25, 2014 at 5:54 PM, Hyunsik Choi <[email protected]> wrote: >> I'm happy to work with you and fix a major nuisance :) Later, I'll >> share the timezone related problem with you when I found additional >> bugs. >> >> Warm regards, >> Hyunsik >> >> On Tue, Nov 25, 2014 at 4:17 PM, Jaewoong Jung <[email protected]> wrote: >>> Everything is very clear now thanks to your explanation. :) >>> >>> Okay, then I'll fix the issue by making DateDatum timezone-neutral and >>> TimeDatum UTC-based. Also, I'll play with PostgreSQL to understand its >>> timezone model better. >>> >>> Meanwhile, please feel free to assign timezone-related bugs to me as >>> you see fit. >>> >>> On Mon, Nov 24, 2014 at 5:52 PM, Hyunsik Choi <[email protected]> wrote: >>>> Thank you all guys for your comments. >>>> >>>> Jaewoong, >>>> >>>> I leave inline comments. If my answers are not enough for your >>>> question or I misunderstood your question, please feel free to ask >>>> additional questions. >>>> >>>> Best regards, >>>> Hyunsik >>>> >>>> On Fri, Nov 21, 2014 at 12:50 AM, Jaewoong Jung <[email protected]> wrote: >>>>> There's another issue that hopefully Hyunsik would be able to clarify, >>>>> and it's very crucial to handling timezones in these data types. >>>>> >>>>> Q: So, let's say (and I agree) TimeDatum represents an instant, so can >>>>> be timezoned. Then, is it a UTC time or a local time? >>>>> >>>> >>>> TimeDatum is UTC or should be UTC if some parts are not. >>>> >>>>> Let me explain why this question is important. >>>>> >>>>> DateDatum represents an instant. And, it is implicitly timezoned to >>>>> the user local time. Which means, if I use '11/20/2014', it's >>>>> '11/20/2014 0:00:00 PST' and is equivalent to a TimestampDatum for >>>>> '11/20/2014 8:00:00 UTC'. (BTW, the compareTo implementation for >>>>> DateDatum is broken in this regard. I'll file a separate issue for >>>>> that.) >>>> >>>> I agree with your proposal. I also think that DateDatum does not need >>>> to be timezoned. We should keep it as DateDatum instead of >>>> TimestampDatum. Thank you for nice finding! >>>> >>>>> What about TimeDatum? It is currently timezoned to UTC, (surprise, >>>>> anyone?) if I understood the code correctly. When we add a TimeDatum >>>>> to a DateDatum, we convert the TimeDatum to the user local time, which >>>>> implies TimeDatum is UTC-based. (Also, the comment next to the line >>>>> explicitly mentions it.) >>>> >>>> You are right. TimeDatum represents UTC time. >>>> >>>> FYI, I'd like to describe additional background. There are only two >>>> entry points to take time or timestamp values. One is records in input >>>> tables, and another is SQL statements. Currently, Input table uses the >>>> system global timezone specified in TajoConf (tajo-site.xml file). >>>> Later, we will add one table property to allow users to specify >>>> timezone for each table. For SQL statement (e.g., SELECT time >>>> '03:00:00'), we will use client timezone. Also, we will provide some >>>> expression to allow users to specify timezone for time or timestamp in >>>> SQL statements. >>>> >>>> Consequently, only two entry points have to deal with timezone for >>>> Timestamp and Time. Other parts in Tajo should deal with all values in >>>> UTC. >>>> >>>>> >>>>> So, here's the problem. >>>>> >>>>> Why do they have different timezones? They're incomplete as an instant >>>>> when used alone and are complementary to each other. This is an >>>>> important concept. To understand it, you have to think about why >>>>> adding DateDatum and TimeDatum is allowed in the code. Originally, >>>>> instants can't be added. (And, that's why I thought TimeDatum is not >>>>> an instant.) You can't add (say) 11/20/2014 to 11/23/2019. Subtracting >>>>> an instant from an instant makes sense and results in a period, but >>>>> they can't be added. >>>>> >>>>> However, in the case of DateDatum and TimeDatum, additions are allowed >>>>> because they're complementary to each other, and what the code does >>>>> **conceptually** is concatenate the two. >>>> >>>> Great insight! So far, I haven't thought it. >>>> >>>>> Therefore, because they're intended to be used together, I'd argue >>>>> they shouldn't have different timezones. Also, if they have different >>>>> timezones, additions can't have a simple correct answer. What's the >>>>> correct answer of 11/20/2014 (PST) + 8:00:00 (UTC)? There's no clear >>>>> answer because they can't be simply concatenated. >>>>> >>>>> (FWIW, the current Tajo code thinks the answer is 11/20/2014 8:00:00 >>>>> in UTC. How many of you got it right?) >>>> >>>> You are definitely right :) If they have different timezone, the >>>> problem becomes very complicated. Nobody wants it :) As I mentioned, >>>> Timezone problem of Timestamp and Time data types should be addressed >>>> in two entry points and client. We need to keep the processing >>>> approach simple. >>>> >>>>> This can cause a lot of confusion to users. When they use a date >>>>> alone, it is interpreted as a local time. But, as soon as they add a >>>>> time to it, it is silently converted to UTC in a way which is very >>>>> unexpected to many users. >>>>> >>>>> Why am I emphasizing it is unexpected? Look at the comparison below. >>>>> >>>>> What's the answer to 11/20/2014 (DateDatum) + 8 hours (IntervalDatum)? >>>>> It's 11/20/2014 8:00 in a local time (PST on my machine). How about >>>>> 11/20/2014 (DateDatum) + 8:00:00 (TimeDatum)? It's 11/20/2014 8:00 in >>>>> UTC as I wrote above. How many users would be able to expect this? >>>> >>>> They are definitely a bug. We follow PostgreSQL in all aspects, and >>>> the following results come from the PostgreSQL. The results of two >>>> operations are the same. >>>> >>>> hyunsik=> SELECT date '11/20/2014' + time '08:00'; >>>> ?column? >>>> --------------------- >>>> 2014-11-20 08:00:00 >>>> (1 row) >>>> >>>> hyunsik=> select date '11/20/2014' + interval '8 hrs'; >>>> ?column? >>>> --------------------- >>>> 2014-11-20 08:00:00 >>>> (1 row) >>>> >>>> >>>>> >>>>> So, coming back to my original question. What timezone should a >>>>> TimeDatum have? UTC or local time? It's currently UTC. But, I believe >>>>> it should be changed to the local time zone. >>>> >>>> TimeDatum is UTC. >>>> >>>> In sum, we should keep both TimeDatum and TimestampDatum UTC values. >>>> Then, we should address timezone offsets in two entry points and >>>> client side. >>>> >>>>> >>>>> (Sorry for the long email. But, I think it's critical to get this >>>>> right and build consensus over it so that we can provide a consistent >>>>> behavior going forward. The actual fix will be really simple, though.) >>>>> >>>>> On Thu, Nov 20, 2014 at 2:44 AM, Hyunsik Choi <[email protected]> >>>>> wrote: >>>>>> Those parts have poor documentation. >>>>>> >>>>>> I agree with your investigation. I also could find many misuse of >>>>>> timezone >>>>>> in many parts. We should make them clear and fix them in this chance. >>>>>> >>>>>> I just got off the plane, and I'm still on the road. So, I'll give more >>>>>> comments tomorrow. >>>>>> >>>>>> - hyunsik >>>>>> >>>>>> On Thu, Nov 20, 2014 at 5:24 PM, Jaewoong Jung <[email protected]> wrote: >>>>>> >>>>>>> Wow, this seemingly trivial issue has surprisingly many problems >>>>>>> involved. >>>>>>> >>>>>>> The most critical one, though, is TimeMeta class. Presumably because >>>>>>> it is poorly documented, it is being used for two different purposes >>>>>>> in Tajo code base. Some code treats it as a date time representation, >>>>>>> which I believe is the original intention, but some treat it as the >>>>>>> human-readable equivalent of TimeDatum by completely ignoring >>>>>>> date-related fields. >>>>>>> >>>>>>> For example, DateTimeUtil.date2j(long julianDate, TimeMeta tm), which >>>>>>> converts a julian timestamp to a TimeMeta value, doesn't touch >>>>>>> dayOfMonth, monthOfYear, or years values and just puts all values for >>>>>>> hour and above units in hours field. >>>>>>> >>>>>>> There are other minor problems like incorrect comments and absent >>>>>>> default values, but the most critical one is misuse of TimeMeta. >>>>>>> >>>>>>> I'll try to break up my patches so that each has a clear and >>>>>>> easy-to-understand goal. Sending this heads-up to let you know >>>>>>> there'll be more issues filed and patches sent than you might expect. >>>>>>> >>>>>>> On Wed, Nov 19, 2014 at 10:56 PM, Jaewoong Jung <[email protected]> >>>>>>> wrote: >>>>>>> > Yeah, after some more research, I found that TImeDatum is a somewhat >>>>>>> > ambiguous data type. >>>>>>> > >>>>>>> > Its original purpose is to represent a time of a day, i.e. hh:mm:ss >>>>>>> > part of an instant. So, it could be viewed as instant data though some >>>>>>> > may argue it's incomplete to fully represent an instant. Anyway, given >>>>>>> > that TimestampDatum has limitation in terms of the time range it can >>>>>>> > represent, and given that Tajo doesn't have DateTime data type, >>>>>>> > clients should be allowed to use it with a timezone. >>>>>>> > >>>>>>> > I'll change the direction and try to address the issue by fixing the >>>>>>> > underflow error. >>>>>>> > >>>>>>> > Thanks for the input. :) >>>>>>> > >>>>>>> > On Wed, Nov 19, 2014 at 10:24 PM, Jong-young Park <[email protected]> >>>>>>> wrote: >>>>>>> >> Hi, Jaewoong. >>>>>>> >> >>>>>>> >> To express time period value, IntervalDatum is existing as I know. >>>>>>> >> >>>>>>> >> So I think it is right that TimeDatum is for some time point. >>>>>>> >> >>>>>>> >> And TimestampDatum seems it is doing both roles of DateDatum and >>>>>>> TimeDatum. >>>>>>> >> >>>>>>> >> Regards, >>>>>>> >> Jongyoung. >>>>>>> >> >>>>>>> >> On Wed Nov 19 2014 at 오후 5:23:40 Jaewoong Jung <[email protected]> >>>>>>> wrote: >>>>>>> >> >>>>>>> >>> It turns out, TAJO-1191 is slightly more complicated than I thought. >>>>>>> >>> (https://issues.apache.org/jira/browse/TAJO-1191) >>>>>>> >>> >>>>>>> >>> Basically, it's about whether TimeDatum may have a timezone tied >>>>>>> >>> with >>>>>>> >>> it. I **believe** TimeDatum is originally designed to hold a time >>>>>>> >>> period value, not an instant. (TimestampDatum seems to be the >>>>>>> >>> canonical container for instants.) So, it doesn't make sense to >>>>>>> >>> apply >>>>>>> >>> any timezones to TimeDatum values, but it's being done in a few >>>>>>> >>> places. And, that's why the test is failing on my machines. >>>>>>> >>> >>>>>>> >>> I'm going to try to fix it by removing all timezone-related >>>>>>> >>> references >>>>>>> >>> around the class, but I want to check my assumption with you before >>>>>>> >>> I >>>>>>> >>> proceed. >>>>>>> >>> >>>>>>> >>> What do you think about it? >>>>>>> >>> >>>>>>>
