[
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401613#comment-13401613
]
Thejas M Nair commented on PIG-1314:
------------------------------------
bq. As far as I know, either Java builtin Date or Joda DateTime uses
millisecond-shift (stored in a long integer variable) from the midnight UTC,
which is not exactly the Unix time.
Yes, as you noted, the difference is unix timestamp can store upto +/- 292
Billion years, while Joda DateTime supports only +/- 292 Milllion years. Which
should be sufficient for most practical purposes! :)
bq. The time zone determines only determines the ISO time string,
It also affects the field values, (getDayOfWeek(), getHourOfDay() etc. In your
data, you can have dates belonging to different timezones, and users might want
to retain that information.
An example of use case where timezone also needs to be stored - if you want to
do analysis of how many people come to a global website during their morning
hours, you want to .getHourOfDay() to return the hour as per local timezone.
We need an efficient way to serialize timezone along with the long. Can you
propose something ? (Maybe, just make it efficient for 256 most 'popular'
timezones and store it a byte. And not have the byte for UTC. For other
timezones, add a timezone string ?)
bq. When we need to convert the DateTime object to Unix time string, we may use
the default time zone of the Pig environment
If the date field has the timezone value in it, we don't have to rely on
default time zone to convert to unix time stamp. (assuming that is what you
meant by 'unix time *string*' )
But udfs like DateTime ToDate(String s) where timezone might not be specified,
we need a default timezone. I think we should use the default timezone on the
pig client machine. Using the default time zone on each task tracker node can
lead to a nightmare in debugging if one of the nodes happens to have a
different timezone. We should allow the user to set a default timezone using a
pig property.
bq. We probably need one more UDF String ToString(DateTime d, String format,
String timezone)
Having timezone argument in this call is necessary only if user wants to print
the time for a different timezone. This is useful, but not mandatory.
bq.Since the ISO duration is non-negative (Please correct me if I'm wrong), we
need to SubstractDuration as well.
Yes, you are right. I could not find any references to negative values in ISO
duration. Lets add SubstractDuration
Trivia from wikipedia: 64 bit unix timestamp, in the negative direction, goes
back more than twenty times the age of the universe
> Add DateTime Support to Pig
> ---------------------------
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
> Issue Type: Bug
> Components: data
> Affects Versions: 0.7.0
> Reporter: Russell Jurney
> Assignee: Zhijie Shen
> Labels: gsoc2012
> Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip
>
> Original Estimate: 672h
> Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a
> timestamp component. Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?
> We're looking at doing this, rather than use UDFs. Is this a patch that
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information
> about the program can be found at
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira