[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401155#comment-13401155
 ] 

Zhijie Shen commented on PIG-1314:
----------------------------------

Dear Thejas and Russell,

{quote}
1) Don't persist DateTimes as ints/longs unless you also persist a timezone 
offset with it somehow (is this possible?).
I forgot about timezone. We need to serialize the timezone information as well, 
while supporting the same range of dates as JodaTime . With int/long this will 
not be possible. (Zhijie can you confirm ?)
{quote}

As far as I know, either Java builtin Date or Joda DateTime uses 
millisecond-shift (stored in a long integer variable) from the midnight UTC, 
which is not exactly the Unix time. Importantly, the millisecond-shift has 
nothing to do with the time zone. For example, both

new DateTime(9223372017043199999L, DateTimeZone.UTC).getMillis();

and

new DateTime(9223372017043199999L, 
DateTimeZone.forID("Asia/Singapore").getMillis();

will return the same value, that is, 9223372017043199999L. The time zone only 
determines only determines the ISO time string, such that the two DateTime 
objects will output different ISO time strings when toString() is called. Hence 
I think the long variable which represents the millisecond-shift is good for 
internal serialization. When we need to convert the DateTime object to Unix 
time string, we may use the default time zone of the Pig environment (I'm still 
working on this. Please let me know how you think the Pig-wide time zone should 
be set.) or the user-defined time zone (We probably need one more UDF String 
ToString(DateTime d, String format, String timezone)).

AS to Pig DateTime, internal Joda DateTime objects is either created with the 
long variable of millisecond-shift or with ISO time string. Initialization with 
a long variable (from Long.MIN_VALUE to Long.MAX_VALUE) has no range problem 
when getMillis() is called, obtaining the result ranged from Long.MIN_VALUE to 
Long.MAX_VALUE as well. Initialization with a ISO time string, the JODA 
DateTime object only accepts the year in the range [-292275054,292278993], such 
that the corresponding millisecond-shift is also within [Long.MIN_VALUE, 
Long.MAX_VALUE]. In summary, the range will be fine when Long is used for 
serialization.

Please correct me if I'm wrong. Thanks a lot!

{quote}
2) Consider using jodatime/ISO8601 durations for date math, as a separate type. 
i.e. If this extends scope too far, save it for later. 
http://en.wikipedia.org/wiki/ISO_8601#Durations
+1 . This is much cleaner. Lets use replace the Add* functions with just 
AddDuration . For example AddDuration(d1, "P3Y"), would return d1 + 3 years.
{quote}

+1. In this way, it is more flexible for users to define the amount of time to 
add/subtract. Since the ISO duration is non-negative (Please correct me if I'm 
wrong), we need to SubstractDuration as well.
                
> Add DateTime Support to Pig
> ---------------------------
>
>                 Key: PIG-1314
>                 URL: https://issues.apache.org/jira/browse/PIG-1314
>             Project: Pig
>          Issue Type: Bug
>          Components: data
>    Affects Versions: 0.7.0
>            Reporter: Russell Jurney
>            Assignee: Zhijie Shen
>              Labels: gsoc2012
>         Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to