This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 0024173 [SPARK-27405][SQL][TEST] Restrict the range of generated
random timestamps
0024173 is described below
commit 00241733a6d7761a4f6e64442319b11dd9d2b57b
Author: Maxim Gekk <[email protected]>
AuthorDate: Mon Apr 8 09:53:00 2019 -0700
[SPARK-27405][SQL][TEST] Restrict the range of generated random timestamps
## What changes were proposed in this pull request?
In the PR, I propose to restrict the range of random timestamp literals
generated in `LiteralGenerator. timestampLiteralGen`. The generator creates
instances of `java.sql.Timestamp` by passing milliseconds since epoch as `Long`
type. Converting the milliseconds to microseconds can cause arithmetic overflow
of Long type because Catalyst's Timestamp type stores microseconds since epoch
in `Long` type internally as well. Proposed interval of random milliseconds is
`[Long.MinValue / 1000, [...]
For example, generated timestamp `new
java.sql.Timestamp(-3948373668011580000)` causes `Long` overflow at the method:
```scala
def fromJavaTimestamp(t: Timestamp): SQLTimestamp = {
...
MILLISECONDS.toMicros(t.getTime()) +
NANOSECONDS.toMicros(t.getNanos()) % NANOS_PER_MICROS
...
}
```
because `t.getTime()` returns `-3948373668011580000` which is multiplied by
`1000` at `MILLISECONDS.toMicros`, and the result `-3948373668011580000000` is
less than `Long.MinValue`.
## How was this patch tested?
By `DateExpressionsSuite` in the PR
https://github.com/apache/spark/pull/24311
Closes #24316 from MaxGekk/random-timestamps-gen.
Authored-by: Maxim Gekk <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
.../spark/sql/catalyst/expressions/LiteralGenerator.scala | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala
index 032aec0..be5fdb5 100644
---
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala
+++
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala
@@ -21,6 +21,7 @@ import java.sql.{Date, Timestamp}
import org.scalacheck.{Arbitrary, Gen}
+import org.apache.spark.sql.catalyst.util.DateTimeUtils
import org.apache.spark.sql.types._
import org.apache.spark.unsafe.types.CalendarInterval
@@ -102,8 +103,16 @@ object LiteralGenerator {
lazy val dateLiteralGen: Gen[Literal] =
for { d <- Arbitrary.arbInt.arbitrary } yield Literal.create(new Date(d),
DateType)
- lazy val timestampLiteralGen: Gen[Literal] =
- for { t <- Arbitrary.arbLong.arbitrary } yield Literal.create(new
Timestamp(t), TimestampType)
+ lazy val timestampLiteralGen: Gen[Literal] = {
+ // Catalyst's Timestamp type stores number of microseconds since epoch in
+ // a variable of Long type. To prevent arithmetic overflow of Long on
+ // conversion from milliseconds to microseconds, the range of random
milliseconds
+ // since epoch is restricted here.
+ val maxMillis = Long.MaxValue / DateTimeUtils.MICROS_PER_MILLIS
+ val minMillis = Long.MinValue / DateTimeUtils.MICROS_PER_MILLIS
+ for { millis <- Gen.choose(minMillis, maxMillis) }
+ yield Literal.create(new Timestamp(millis), TimestampType)
+ }
lazy val calendarIntervalLiterGen: Gen[Literal] =
for { m <- Arbitrary.arbInt.arbitrary; s <- Arbitrary.arbLong.arbitrary}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]