[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401155#comment-13401155 ]
Zhijie Shen commented on PIG-1314: ---------------------------------- Dear Thejas and Russell, {quote} 1) Don't persist DateTimes as ints/longs unless you also persist a timezone offset with it somehow (is this possible?). I forgot about timezone. We need to serialize the timezone information as well, while supporting the same range of dates as JodaTime . With int/long this will not be possible. (Zhijie can you confirm ?) {quote} As far as I know, either Java builtin Date or Joda DateTime uses millisecond-shift (stored in a long integer variable) from the midnight UTC, which is not exactly the Unix time. Importantly, the millisecond-shift has nothing to do with the time zone. For example, both new DateTime(9223372017043199999L, DateTimeZone.UTC).getMillis(); and new DateTime(9223372017043199999L, DateTimeZone.forID("Asia/Singapore").getMillis(); will return the same value, that is, 9223372017043199999L. The time zone only determines only determines the ISO time string, such that the two DateTime objects will output different ISO time strings when toString() is called. Hence I think the long variable which represents the millisecond-shift is good for internal serialization. When we need to convert the DateTime object to Unix time string, we may use the default time zone of the Pig environment (I'm still working on this. Please let me know how you think the Pig-wide time zone should be set.) or the user-defined time zone (We probably need one more UDF String ToString(DateTime d, String format, String timezone)). AS to Pig DateTime, internal Joda DateTime objects is either created with the long variable of millisecond-shift or with ISO time string. Initialization with a long variable (from Long.MIN_VALUE to Long.MAX_VALUE) has no range problem when getMillis() is called, obtaining the result ranged from Long.MIN_VALUE to Long.MAX_VALUE as well. Initialization with a ISO time string, the JODA DateTime object only accepts the year in the range [-292275054,292278993], such that the corresponding millisecond-shift is also within [Long.MIN_VALUE, Long.MAX_VALUE]. In summary, the range will be fine when Long is used for serialization. Please correct me if I'm wrong. Thanks a lot! {quote} 2) Consider using jodatime/ISO8601 durations for date math, as a separate type. i.e. If this extends scope too far, save it for later. http://en.wikipedia.org/wiki/ISO_8601#Durations +1 . This is much cleaner. Lets use replace the Add* functions with just AddDuration . For example AddDuration(d1, "P3Y"), would return d1 + 3 years. {quote} +1. In this way, it is more flexible for users to define the amount of time to add/subtract. Since the ISO duration is non-negative (Please correct me if I'm wrong), we need to SubstractDuration as well. > Add DateTime Support to Pig > --------------------------- > > Key: PIG-1314 > URL: https://issues.apache.org/jira/browse/PIG-1314 > Project: Pig > Issue Type: Bug > Components: data > Affects Versions: 0.7.0 > Reporter: Russell Jurney > Assignee: Zhijie Shen > Labels: gsoc2012 > Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip > > Original Estimate: 672h > Remaining Estimate: 672h > > Hadoop/Pig are primarily used to parse log data, and most logs have a > timestamp component. Therefore Pig should support dates as a primitive. > Can someone familiar with adding types to pig comment on how hard this is? > We're looking at doing this, rather than use UDFs. Is this a patch that > would be accepted? > This is a candidate project for Google summer of code 2012. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira