> On Aug. 7, 2012, 12:56 a.m., Ashutosh Chauhan wrote:
> > trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateDiff.java, line 312
> > <https://reviews.apache.org/r/6027/diff/1/?file=124154#file124154line312>
> >
> >     Looks like this function is not used anywhere. Please remove it.

fixed!


> On Aug. 7, 2012, 12:56 a.m., Ashutosh Chauhan wrote:
> > trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateDiff.java, line 340
> > <https://reviews.apache.org/r/6027/diff/1/?file=124154#file124154line340>
> >
> >     Looks like this function is not used anywhere. Please, remove it.

fixed!


> On Aug. 7, 2012, 12:56 a.m., Ashutosh Chauhan wrote:
> > trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateDiff.java, line 180
> > <https://reviews.apache.org/r/6027/diff/1/?file=124154#file124154line180>
> >
> >     Avoid unnecessary object creation. Do, Date date1 = 
> > resolveDate(dateObj1, unit) which is more appropriate. Similarly for date2.

fixed!


> On Aug. 7, 2012, 12:56 a.m., Ashutosh Chauhan wrote:
> > trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateDiff.java, line 57
> > <https://reviews.apache.org/r/6027/diff/1/?file=124154#file124154line57>
> >
> >     Instead of these enums, can we use these ints instead 
> > http://docs.oracle.com/javase/6/docs/api/constant-values.html#java.text.DateFormat
> >  ?
> >     
> >     Also, I don't think microseconds make sense, we don't have that 
> > precision in any case.

not every unit supported has an int associated with it, so does it still make 
sense to use these values? Also, is it ideal to use these ints when not 
everyone necessarily knows that, for example, 1 = year and 6 = minute?


> On Aug. 7, 2012, 12:56 a.m., Ashutosh Chauhan wrote:
> > trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateDiff.java, line 65
> > <https://reviews.apache.org/r/6027/diff/1/?file=124154#file124154line65>
> >
> >     Lets get rid of formatter variable, add default format ("yyyy-MM-dd") 
> > as first format in dateFormats and use formatLong() for all formats

The way formatLong() works is that it goes through each of those formats in 
chronological order. The reason is that a timestamp in the form yyyy-MM-dd 
HH:mm:ss could fit both yyyy-MM-dd HH:mm:ss and yyyy-MM-dd formats. I don't 
want something to check if yyyy-MM-dd is appropriate and ignore the HH:mm:ss 
unnecessarily. Do you think including the formatter variable as index 0 in the 
array is still the best option?


- Shefali


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6027/#review9924
-----------------------------------------------------------


On July 18, 2012, 12:56 a.m., Shefali Vohra wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6027/
> -----------------------------------------------------------
> 
> (Updated July 18, 2012, 12:56 a.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Description
> -------
> 
> Parameters
>  This function overloads the current DateDiff(expr1, expr2) by adding another 
> parameter to specify the units. It takes 3 parameters. The first two are 
> timestamps, and the formats accepted are:
>  yyyy-MM-dd
>  yyyy-MM-dd HH:mm:ss
>  yyyy-MM-dd HH:mm:ss.milli
> 
> These are the formats accepted by the current DateDiff(expr1, expr2) function 
> and allow for that consistency. The accepted data types for the timestamp 
> will be Text, TimestampWritable, Date, and String, just as with the already 
> existing function.
> 
> The third parameter is the units the user wants the response to be in. 
> Acceptable units are:
>  Microsecond
>  Millisecond
>  Second
>  Minute
>  Hour
>  Day
>  Week
>  Month
>  Quarter
>  Year
> 
> When calculating the difference, the full timestamp is used when the 
> specified unit is hour or smaller (microsecond, millisecond, second, minute, 
> hour), and only the date part is used if the unit is day or larger (day, 
> week, month, quarter, year). If for the smaller units the time is not 
> specified and the format yyyy-MM-dd is used, the time 00:00:00.0 is used. 
> Leap years are accounted for by the Calendar class in Java, which inherently 
> addresses the issue.
> 
> The assumption is made that all these time parameters are in the same time 
> zone.
> 
> Return Value
>  The function returns expr1 - expr2 expressed as an int in the units 
> specified.
> 
> Hive vs. SQL
>  SQL also has a DateDiff() function with some more acceptable units. The 
> order of parameters is different between SQL and Hive. The reason for this is 
> that Hive already has a DateDiff() function with the same first two 
> parameters, and having this order here allows for that consistency within 
> Hive.
> 
> Example Query
>  hive > DATEDIFF(DATE_FIELD, '2012-06-01', ‘day’); 
> 
> Diagnostic Error Messages
>  Invalid table alias or column name reference
>  Table not found
> 
> 
> This addresses bug HIVE-3216.
>     https://issues.apache.org/jira/browse/HIVE-3216
> 
> 
> Diffs
> -----
> 
>   trunk/data/files/datetable.txt PRE-CREATION 
>   trunk/data/files/timestamptable.txt PRE-CREATION 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateDiff.java 1362724 
>   trunk/ql/src/test/queries/clientnegative/udf_datediff.q PRE-CREATION 
>   trunk/ql/src/test/queries/clientpositive/udf_datediff.q 1362724 
>   trunk/ql/src/test/results/clientnegative/udf_datediff.q.out PRE-CREATION 
>   trunk/ql/src/test/results/clientpositive/udf_datediff.q.out 1362724 
> 
> Diff: https://reviews.apache.org/r/6027/diff/
> 
> 
> Testing
> -------
> 
> positive and negative test cases included
> 
> 
> Thanks,
> 
> Shefali Vohra
> 
>

Reply via email to