[spark] branch master updated: [SPARK-27405][SQL][TEST] Restrict the range of generated random timestamps

dongjoon Mon, 08 Apr 2019 09:53:45 -0700

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 0024173  [SPARK-27405][SQL][TEST] Restrict the range of generated 
random timestamps
0024173 is described below

commit 00241733a6d7761a4f6e64442319b11dd9d2b57b
Author: Maxim Gekk <[email protected]>
AuthorDate: Mon Apr 8 09:53:00 2019 -0700

    [SPARK-27405][SQL][TEST] Restrict the range of generated random timestamps
    
    ## What changes were proposed in this pull request?
    
    In the PR, I propose to restrict the range of random timestamp literals 
generated in `LiteralGenerator. timestampLiteralGen`. The generator creates 
instances of `java.sql.Timestamp` by passing milliseconds since epoch as `Long` 
type. Converting the milliseconds to microseconds can cause arithmetic overflow 
of Long type because Catalyst's Timestamp type stores microseconds since epoch 
in `Long` type internally as well. Proposed interval of random milliseconds is 
`[Long.MinValue / 1000, [...]
    
    For example, generated timestamp `new 
java.sql.Timestamp(-3948373668011580000)` causes `Long` overflow at the method:
    ```scala
      def fromJavaTimestamp(t: Timestamp): SQLTimestamp = {
      ...
          MILLISECONDS.toMicros(t.getTime()) + 
NANOSECONDS.toMicros(t.getNanos()) % NANOS_PER_MICROS
      ...
      }
    ```
    because `t.getTime()` returns `-3948373668011580000` which is multiplied by 
`1000` at `MILLISECONDS.toMicros`, and the result `-3948373668011580000000` is 
less than `Long.MinValue`.
    
    ## How was this patch tested?
    
    By `DateExpressionsSuite` in the PR 
https://github.com/apache/spark/pull/24311
    
    Closes #24316 from MaxGekk/random-timestamps-gen.
    
    Authored-by: Maxim Gekk <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 .../spark/sql/catalyst/expressions/LiteralGenerator.scala   | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala
index 032aec0..be5fdb5 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala
@@ -21,6 +21,7 @@ import java.sql.{Date, Timestamp}
 
 import org.scalacheck.{Arbitrary, Gen}
 
+import org.apache.spark.sql.catalyst.util.DateTimeUtils
 import org.apache.spark.sql.types._
 import org.apache.spark.unsafe.types.CalendarInterval
 
@@ -102,8 +103,16 @@ object LiteralGenerator {
   lazy val dateLiteralGen: Gen[Literal] =
     for { d <- Arbitrary.arbInt.arbitrary } yield Literal.create(new Date(d), 
DateType)
 
-  lazy val timestampLiteralGen: Gen[Literal] =
-    for { t <- Arbitrary.arbLong.arbitrary } yield Literal.create(new 
Timestamp(t), TimestampType)
+  lazy val timestampLiteralGen: Gen[Literal] = {
+    // Catalyst's Timestamp type stores number of microseconds since epoch in
+    // a variable of Long type. To prevent arithmetic overflow of Long on
+    // conversion from milliseconds to microseconds, the range of random 
milliseconds
+    // since epoch is restricted here.
+    val maxMillis = Long.MaxValue / DateTimeUtils.MICROS_PER_MILLIS
+    val minMillis = Long.MinValue / DateTimeUtils.MICROS_PER_MILLIS
+    for { millis <- Gen.choose(minMillis, maxMillis) }
+      yield Literal.create(new Timestamp(millis), TimestampType)
+  }
 
   lazy val calendarIntervalLiterGen: Gen[Literal] =
     for { m <- Arbitrary.arbInt.arbitrary; s <- Arbitrary.arbLong.arbitrary}


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[spark] branch master updated: [SPARK-27405][SQL][TEST] Restrict the range of generated random timestamps

Reply via email to