paul-rogers commented on a change in pull request #2268:
URL: https://github.com/apache/drill/pull/2268#discussion_r670925601
##########
File path:
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/DateTypeFunctions.java
##########
@@ -470,43 +500,73 @@ public void eval() {
@Param DateHolder left;
@Param DateHolder right;
@Output IntervalHolder out;
+ @Inject ContextInformation contextInfo;
@Override
public void setup() {
}
@Override
public void eval() {
- long diff = left.value - right.value;
- long days = diff /
org.apache.drill.exec.vector.DateUtilities.daysToStandardMillis;
- out.months = (int) (days /
org.apache.drill.exec.vector.DateUtilities.monthToStandardDays);
- out.days = (int) (days %
org.apache.drill.exec.vector.DateUtilities.monthToStandardDays);
- out.milliseconds = (int) (diff %
org.apache.drill.exec.vector.DateUtilities.daysToStandardMillis);
+ java.time.OffsetDateTime from =
java.time.Instant.ofEpochMilli(right.value).atOffset(java.time.ZoneOffset.UTC);
+ java.time.OffsetDateTime to =
java.time.Instant.ofEpochMilli(left.value).atOffset(java.time.ZoneOffset.UTC);
+ java.time.Duration duration =
java.time.Duration.between(from.toLocalTime(), to.toLocalTime());
+ java.time.Period period;
+
+ if (from.isAfter(to) &&
duration.compareTo(java.time.Duration.ZERO) > 0) {
+ // negative period and positive duration
+ period = java.time.Period.between(from.toLocalDate(),
to.toLocalDate().plusDays(1));
+ duration = duration.minusDays(1);
+ } else if (from.isBefore(to) &&
duration.compareTo(java.time.Duration.ZERO) < 0) {
+ // positive period and negative duration
+ period = java.time.Period.between(from.toLocalDate(),
to.toLocalDate().minusDays(1));
+ duration = duration.plusDays(1);
+ } else {
+ period = java.time.Period.between(from.toLocalDate(),
to.toLocalDate());
+ }
+
+ out.months = (int) period.toTotalMonths();
+ out.days = period.getDays();
+ out.milliseconds = (int) duration.toMillis();
}
}
@FunctionTemplate(name = "age", scope =
FunctionTemplate.FunctionScope.SIMPLE, nulls = NullHandling.NULL_IF_NULL)
public static class AgeDate2Function implements DrillSimpleFunc {
@Param DateHolder right;
- @Workspace long queryStartDate;
+ @Workspace java.time.OffsetDateTime queryStartDate;
@Output IntervalHolder out;
@Inject ContextInformation contextInfo;
@Override
public void setup() {
int timeZoneIndex = contextInfo.getRootFragmentTimeZone();
- org.joda.time.DateTimeZone timeZone =
org.joda.time.DateTimeZone.forID(org.apache.drill.exec.expr.fn.impl.DateUtility.getTimeZone(timeZoneIndex));
- org.joda.time.DateTime now = new
org.joda.time.DateTime(contextInfo.getQueryStartTime(), timeZone);
- queryStartDate = (new org.joda.time.DateMidnight(now.getYear(),
now.getMonthOfYear(), now.getDayOfMonth(), timeZone)).getMillis();
+ java.time.ZoneId zoneId =
java.time.ZoneId.of(org.apache.drill.exec.expr.fn.impl.DateUtility.getTimeZone(timeZoneIndex));
+ java.time.ZonedDateTime dt =
java.time.Instant.ofEpochMilli(contextInfo.getQueryStartTime()).atZone(zoneId);
+ queryStartDate = java.time.OffsetDateTime.of(dt.toLocalDate(),
java.time.LocalTime.MIDNIGHT, java.time.ZoneOffset.UTC);
Review comment:
See detailed comment below. The `queryStartDate` should be in the
**local** time zone since Drill interprets all dates and times (including
timestamps) in the local time.
Note that the above is actually wrong. Drill is distributed. The above is
computed per-fragment. Different fragments will start at different times. There
will be corner cases in which that difference is enough to cause a duration
that is 23H23M59S on one machine, and 1D00H00M02S on another machine or
fragment. The result will be random errors.
Specifically, run a query across two machines, with a run of inputs that all
have the same date. The output will contain jitter: ages may differ depending
on which machine processed the given row.
Further, there is no guarantee that any two machines have synchronized
clocks, leading to further errors.
The right answer would be to provide a "global" true query start time in the
query plan. All `AGE()` calculations should run relative to that time.
##########
File path:
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/DateTypeFunctions.java
##########
@@ -470,43 +500,73 @@ public void eval() {
@Param DateHolder left;
@Param DateHolder right;
@Output IntervalHolder out;
+ @Inject ContextInformation contextInfo;
@Override
public void setup() {
}
@Override
public void eval() {
- long diff = left.value - right.value;
- long days = diff /
org.apache.drill.exec.vector.DateUtilities.daysToStandardMillis;
- out.months = (int) (days /
org.apache.drill.exec.vector.DateUtilities.monthToStandardDays);
- out.days = (int) (days %
org.apache.drill.exec.vector.DateUtilities.monthToStandardDays);
- out.milliseconds = (int) (diff %
org.apache.drill.exec.vector.DateUtilities.daysToStandardMillis);
+ java.time.OffsetDateTime from =
java.time.Instant.ofEpochMilli(right.value).atOffset(java.time.ZoneOffset.UTC);
+ java.time.OffsetDateTime to =
java.time.Instant.ofEpochMilli(left.value).atOffset(java.time.ZoneOffset.UTC);
+ java.time.Duration duration =
java.time.Duration.between(from.toLocalTime(), to.toLocalTime());
+ java.time.Period period;
+
+ if (from.isAfter(to) &&
duration.compareTo(java.time.Duration.ZERO) > 0) {
+ // negative period and positive duration
+ period = java.time.Period.between(from.toLocalDate(),
to.toLocalDate().plusDays(1));
+ duration = duration.minusDays(1);
+ } else if (from.isBefore(to) &&
duration.compareTo(java.time.Duration.ZERO) < 0) {
+ // positive period and negative duration
+ period = java.time.Period.between(from.toLocalDate(),
to.toLocalDate().minusDays(1));
+ duration = duration.plusDays(1);
+ } else {
+ period = java.time.Period.between(from.toLocalDate(),
to.toLocalDate());
+ }
+
+ out.months = (int) period.toTotalMonths();
+ out.days = period.getDays();
+ out.milliseconds = (int) duration.toMillis();
}
}
@FunctionTemplate(name = "age", scope =
FunctionTemplate.FunctionScope.SIMPLE, nulls = NullHandling.NULL_IF_NULL)
public static class AgeDate2Function implements DrillSimpleFunc {
@Param DateHolder right;
- @Workspace long queryStartDate;
+ @Workspace java.time.OffsetDateTime queryStartDate;
@Output IntervalHolder out;
@Inject ContextInformation contextInfo;
@Override
public void setup() {
int timeZoneIndex = contextInfo.getRootFragmentTimeZone();
- org.joda.time.DateTimeZone timeZone =
org.joda.time.DateTimeZone.forID(org.apache.drill.exec.expr.fn.impl.DateUtility.getTimeZone(timeZoneIndex));
- org.joda.time.DateTime now = new
org.joda.time.DateTime(contextInfo.getQueryStartTime(), timeZone);
- queryStartDate = (new org.joda.time.DateMidnight(now.getYear(),
now.getMonthOfYear(), now.getDayOfMonth(), timeZone)).getMillis();
+ java.time.ZoneId zoneId =
java.time.ZoneId.of(org.apache.drill.exec.expr.fn.impl.DateUtility.getTimeZone(timeZoneIndex));
+ java.time.ZonedDateTime dt =
java.time.Instant.ofEpochMilli(contextInfo.getQueryStartTime()).atZone(zoneId);
+ queryStartDate = java.time.OffsetDateTime.of(dt.toLocalDate(),
java.time.LocalTime.MIDNIGHT, java.time.ZoneOffset.UTC);
}
@Override
public void eval() {
- long diff = queryStartDate - right.value;
- long days = diff /
org.apache.drill.exec.vector.DateUtilities.daysToStandardMillis;
- out.months = (int) (days /
org.apache.drill.exec.vector.DateUtilities.monthToStandardDays);
- out.days = (int) (days %
org.apache.drill.exec.vector.DateUtilities.monthToStandardDays);
- out.milliseconds = (int) (diff %
org.apache.drill.exec.vector.DateUtilities.daysToStandardMillis);
+ java.time.OffsetDateTime dt =
java.time.Instant.ofEpochMilli(right.value).atOffset(java.time.ZoneOffset.UTC);
+ java.time.Duration duration =
java.time.Duration.between(dt.toLocalTime(), queryStartDate.toLocalTime());
+ java.time.Period period;
+
+ if (dt.isAfter(queryStartDate) &&
duration.compareTo(java.time.Duration.ZERO) > 0) {
Review comment:
This is tricky code. It *should not* be in a UDF where you can neither
test nor debug it. Instead, factor it into a function. Write unit tests for the
functions for all the three paths (and other corner cases you can think up.)
Call that function from here.
Given the number of function calls in this code, and the number of memory
allocations, let's not worry about the overhead of adding one more function
layer. Let's instead ensure the cases work.
##########
File path:
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/DateTypeFunctions.java
##########
@@ -470,43 +500,73 @@ public void eval() {
@Param DateHolder left;
@Param DateHolder right;
@Output IntervalHolder out;
+ @Inject ContextInformation contextInfo;
@Override
public void setup() {
}
@Override
public void eval() {
- long diff = left.value - right.value;
- long days = diff /
org.apache.drill.exec.vector.DateUtilities.daysToStandardMillis;
- out.months = (int) (days /
org.apache.drill.exec.vector.DateUtilities.monthToStandardDays);
- out.days = (int) (days %
org.apache.drill.exec.vector.DateUtilities.monthToStandardDays);
- out.milliseconds = (int) (diff %
org.apache.drill.exec.vector.DateUtilities.daysToStandardMillis);
+ java.time.OffsetDateTime from =
java.time.Instant.ofEpochMilli(right.value).atOffset(java.time.ZoneOffset.UTC);
+ java.time.OffsetDateTime to =
java.time.Instant.ofEpochMilli(left.value).atOffset(java.time.ZoneOffset.UTC);
+ java.time.Duration duration =
java.time.Duration.between(from.toLocalTime(), to.toLocalTime());
+ java.time.Period period;
+
+ if (from.isAfter(to) &&
duration.compareTo(java.time.Duration.ZERO) > 0) {
+ // negative period and positive duration
+ period = java.time.Period.between(from.toLocalDate(),
to.toLocalDate().plusDays(1));
+ duration = duration.minusDays(1);
+ } else if (from.isBefore(to) &&
duration.compareTo(java.time.Duration.ZERO) < 0) {
+ // positive period and negative duration
+ period = java.time.Period.between(from.toLocalDate(),
to.toLocalDate().minusDays(1));
+ duration = duration.plusDays(1);
+ } else {
+ period = java.time.Period.between(from.toLocalDate(),
to.toLocalDate());
+ }
+
+ out.months = (int) period.toTotalMonths();
+ out.days = period.getDays();
+ out.milliseconds = (int) duration.toMillis();
}
}
@FunctionTemplate(name = "age", scope =
FunctionTemplate.FunctionScope.SIMPLE, nulls = NullHandling.NULL_IF_NULL)
public static class AgeDate2Function implements DrillSimpleFunc {
@Param DateHolder right;
- @Workspace long queryStartDate;
+ @Workspace java.time.OffsetDateTime queryStartDate;
@Output IntervalHolder out;
@Inject ContextInformation contextInfo;
@Override
public void setup() {
int timeZoneIndex = contextInfo.getRootFragmentTimeZone();
- org.joda.time.DateTimeZone timeZone =
org.joda.time.DateTimeZone.forID(org.apache.drill.exec.expr.fn.impl.DateUtility.getTimeZone(timeZoneIndex));
- org.joda.time.DateTime now = new
org.joda.time.DateTime(contextInfo.getQueryStartTime(), timeZone);
- queryStartDate = (new org.joda.time.DateMidnight(now.getYear(),
now.getMonthOfYear(), now.getDayOfMonth(), timeZone)).getMillis();
+ java.time.ZoneId zoneId =
java.time.ZoneId.of(org.apache.drill.exec.expr.fn.impl.DateUtility.getTimeZone(timeZoneIndex));
+ java.time.ZonedDateTime dt =
java.time.Instant.ofEpochMilli(contextInfo.getQueryStartTime()).atZone(zoneId);
+ queryStartDate = java.time.OffsetDateTime.of(dt.toLocalDate(),
java.time.LocalTime.MIDNIGHT, java.time.ZoneOffset.UTC);
}
@Override
public void eval() {
- long diff = queryStartDate - right.value;
- long days = diff /
org.apache.drill.exec.vector.DateUtilities.daysToStandardMillis;
- out.months = (int) (days /
org.apache.drill.exec.vector.DateUtilities.monthToStandardDays);
- out.days = (int) (days %
org.apache.drill.exec.vector.DateUtilities.monthToStandardDays);
- out.milliseconds = (int) (diff %
org.apache.drill.exec.vector.DateUtilities.daysToStandardMillis);
+ java.time.OffsetDateTime dt =
java.time.Instant.ofEpochMilli(right.value).atOffset(java.time.ZoneOffset.UTC);
Review comment:
The above has a bug due to a horrible Drill bug.
Drill's "timestamp" is relative to 1/1/970 local time, not UTC. The
`right.value` is a number of milliseconds since 1970-01-01'T'00:00:00 **local
time**. This is true even if the underlying data source has dates in the
(currently unsupported) UTC time zone, unless the (virtual) machine running
Drill runs in the UTC time zone.
This is horrible, wrong in the era of big data, but it is what it is. It was
explained to me, as "it is what customer's expect because that's how Oracle
does it", and Oracle does it that way because the original DATE, TIME and
DATETIME had no implied time zone, they were "the time wherever you happen to
be" because they were defined in the late 1970s.
In terms of Java time, we have to convert the ms arguments into a
`LocalTime`. It appears that `Period` will "do the right thing" if given two
`LocalTime` arguments.
Fortunately, here, since we are computing an `AGE()`, we get the right
answer if we work in either UTC or local time.
##########
File path:
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/DateTypeFunctions.java
##########
@@ -470,43 +500,73 @@ public void eval() {
@Param DateHolder left;
@Param DateHolder right;
@Output IntervalHolder out;
+ @Inject ContextInformation contextInfo;
@Override
public void setup() {
}
@Override
public void eval() {
- long diff = left.value - right.value;
- long days = diff /
org.apache.drill.exec.vector.DateUtilities.daysToStandardMillis;
- out.months = (int) (days /
org.apache.drill.exec.vector.DateUtilities.monthToStandardDays);
- out.days = (int) (days %
org.apache.drill.exec.vector.DateUtilities.monthToStandardDays);
- out.milliseconds = (int) (diff %
org.apache.drill.exec.vector.DateUtilities.daysToStandardMillis);
+ java.time.OffsetDateTime from =
java.time.Instant.ofEpochMilli(right.value).atOffset(java.time.ZoneOffset.UTC);
+ java.time.OffsetDateTime to =
java.time.Instant.ofEpochMilli(left.value).atOffset(java.time.ZoneOffset.UTC);
+ java.time.Duration duration =
java.time.Duration.between(from.toLocalTime(), to.toLocalTime());
+ java.time.Period period;
+
+ if (from.isAfter(to) &&
duration.compareTo(java.time.Duration.ZERO) > 0) {
+ // negative period and positive duration
+ period = java.time.Period.between(from.toLocalDate(),
to.toLocalDate().plusDays(1));
+ duration = duration.minusDays(1);
+ } else if (from.isBefore(to) &&
duration.compareTo(java.time.Duration.ZERO) < 0) {
+ // positive period and negative duration
+ period = java.time.Period.between(from.toLocalDate(),
to.toLocalDate().minusDays(1));
+ duration = duration.plusDays(1);
+ } else {
+ period = java.time.Period.between(from.toLocalDate(),
to.toLocalDate());
+ }
+
+ out.months = (int) period.toTotalMonths();
+ out.days = period.getDays();
+ out.milliseconds = (int) duration.toMillis();
}
}
@FunctionTemplate(name = "age", scope =
FunctionTemplate.FunctionScope.SIMPLE, nulls = NullHandling.NULL_IF_NULL)
public static class AgeDate2Function implements DrillSimpleFunc {
@Param DateHolder right;
- @Workspace long queryStartDate;
+ @Workspace java.time.OffsetDateTime queryStartDate;
@Output IntervalHolder out;
@Inject ContextInformation contextInfo;
@Override
public void setup() {
int timeZoneIndex = contextInfo.getRootFragmentTimeZone();
- org.joda.time.DateTimeZone timeZone =
org.joda.time.DateTimeZone.forID(org.apache.drill.exec.expr.fn.impl.DateUtility.getTimeZone(timeZoneIndex));
- org.joda.time.DateTime now = new
org.joda.time.DateTime(contextInfo.getQueryStartTime(), timeZone);
- queryStartDate = (new org.joda.time.DateMidnight(now.getYear(),
now.getMonthOfYear(), now.getDayOfMonth(), timeZone)).getMillis();
+ java.time.ZoneId zoneId =
java.time.ZoneId.of(org.apache.drill.exec.expr.fn.impl.DateUtility.getTimeZone(timeZoneIndex));
+ java.time.ZonedDateTime dt =
java.time.Instant.ofEpochMilli(contextInfo.getQueryStartTime()).atZone(zoneId);
+ queryStartDate = java.time.OffsetDateTime.of(dt.toLocalDate(),
java.time.LocalTime.MIDNIGHT, java.time.ZoneOffset.UTC);
}
@Override
public void eval() {
- long diff = queryStartDate - right.value;
- long days = diff /
org.apache.drill.exec.vector.DateUtilities.daysToStandardMillis;
- out.months = (int) (days /
org.apache.drill.exec.vector.DateUtilities.monthToStandardDays);
- out.days = (int) (days %
org.apache.drill.exec.vector.DateUtilities.monthToStandardDays);
- out.milliseconds = (int) (diff %
org.apache.drill.exec.vector.DateUtilities.daysToStandardMillis);
+ java.time.OffsetDateTime dt =
java.time.Instant.ofEpochMilli(right.value).atOffset(java.time.ZoneOffset.UTC);
+ java.time.Duration duration =
java.time.Duration.between(dt.toLocalTime(), queryStartDate.toLocalTime());
+ java.time.Period period;
Review comment:
Shame we have to do memory allocations in the per-column inner loop.
Drill goes to heroic lengths to avoid such allocations. Is there some "modern"
library which can do these calcs without creating garbage? Or, will Drill's (or
the JVM's) optimizer rewrite the code to avoid garbage?
##########
File path:
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/DateTypeFunctions.java
##########
@@ -425,43 +425,73 @@ public void eval() {
@Param TimeStampHolder left;
@Param TimeStampHolder right;
@Output IntervalHolder out;
+ @Inject ContextInformation contextInfo;
@Override
public void setup() {
}
@Override
public void eval() {
- long diff = left.value - right.value;
- long days = diff /
org.apache.drill.exec.vector.DateUtilities.daysToStandardMillis;
- out.months = (int) (days /
org.apache.drill.exec.vector.DateUtilities.monthToStandardDays);
- out.days = (int) (days %
org.apache.drill.exec.vector.DateUtilities.monthToStandardDays);
- out.milliseconds = (int) (diff %
org.apache.drill.exec.vector.DateUtilities.daysToStandardMillis);
+ java.time.OffsetDateTime from =
java.time.Instant.ofEpochMilli(right.value).atOffset(java.time.ZoneOffset.UTC);
+ java.time.OffsetDateTime to =
java.time.Instant.ofEpochMilli(left.value).atOffset(java.time.ZoneOffset.UTC);
+ java.time.Duration duration =
java.time.Duration.between(from.toLocalTime(), to.toLocalTime());
+ java.time.Period period;
+
+ if (from.isAfter(to) &&
duration.compareTo(java.time.Duration.ZERO) > 0) {
Review comment:
This code appears to be a copy/paste of the code below (or visa versa.)
In general, reuse via copy/past is discouraged.
See note below about factoring out the (common) code into a separate
function which you can test and call from both of the UDFs.
Note that I'd suggest doing the Drill-type-to-Java type conversions in that
function. These conversions are complex; it would be good to have extensive
unit tests. Running queries to test the code is, as noted, below, too
cumbersome.
##########
File path: exec/java-exec/src/test/java/org/apache/drill/TestBugFixes.java
##########
@@ -333,4 +333,15 @@ public void testDRILL6547() throws Exception {
.build()
.run();
}
+
+ @Test
+ public void testDRILL7926() throws Exception {
+ testBuilder()
+ .sqlQuery("SELECT EXTRACT(YEAR FROM M.`YEARS`) `YEARS` FROM (SELECT
AGE('2021-05-13', '2007-07-02') `YEARS`) M")
Review comment:
This test is inadequate for the many cases in the code and that we
discussed above.
Suggestion: as noted above, factor the "hard part" into functions which you
can unit test outside a query. Then, the above is fine as a sanity test.
##########
File path:
exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestDateFunctions.java
##########
@@ -104,10 +104,10 @@ public void testDateDifferenceArithmetic() throws
Exception {
@Test
public void testAge() throws Exception {
- String[] expectedResults = {"P109M16DT82800S",
- "P172M27D",
- "P-172M-27D",
- "P-39M-18DT-63573S"};
+ String[] expectedResults = {"P107M30DT82800S",
Review comment:
Really? I guess we have zero tests in this area, which is as scary as
the horrible bug. We really should track down other references, fix them, and
then remove the buggy code. I suspect that something in the vector layer may
have used this code also, IIRC.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]