spark git commit: [SPARK-17035] [SQL] [PYSPARK] Improve Timestamp not to lose precision for all cases

davies Tue, 16 Aug 2016 10:01:46 -0700

Repository: spark
Updated Branches:
  refs/heads/master 6f0988b12 -> 12a89e55c



[SPARK-17035] [SQL] [PYSPARK] Improve Timestamp not to lose precision for all 
cases

## What changes were proposed in this pull request?

`PySpark` loses `microsecond` precision for some corner cases during converting 
`Timestamp` into `Long`. For example, for the following `datetime.max` value 
should be converted a value whose last 6 digits are '999999'. This PR improves 
the logic not to lose precision for all cases.

**Corner case**
```python
>>> datetime.datetime.max
datetime.datetime(9999, 12, 31, 23, 59, 59, 999999)
```

**Before**
```python
>>> from datetime import datetime
>>> from pyspark.sql import Row
>>> from pyspark.sql.types import StructType, StructField, TimestampType
>>> schema = StructType([StructField("dt", TimestampType(), False)])
>>> [schema.toInternal(row) for row in [{"dt": datetime.max}]]
[(253402329600000000,)]
```

**After**
```python
>>> [schema.toInternal(row) for row in [{"dt": datetime.max}]]
[(253402329599999999,)]
```

## How was this patch tested?

Pass the Jenkins test with a new test case.

Author: Dongjoon Hyun <dongj...@apache.org>

Closes #14631 from dongjoon-hyun/SPARK-17035.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/12a89e55
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/12a89e55
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/12a89e55

Branch: refs/heads/master
Commit: 12a89e55cbd630fa2986da984e066cd07d3bf1f7
Parents: 6f0988b
Author: Dongjoon Hyun <dongj...@apache.org>
Authored: Tue Aug 16 10:01:30 2016 -0700
Committer: Davies Liu <davies....@gmail.com>
Committed: Tue Aug 16 10:01:30 2016 -0700

----------------------------------------------------------------------
 python/pyspark/sql/tests.py | 5 +++++
 python/pyspark/sql/types.py | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/12a89e55/python/pyspark/sql/tests.py
----------------------------------------------------------------------
diff --git a/python/pyspark/sql/tests.py b/python/pyspark/sql/tests.py
index 520b09d..fc41701 100644
--- a/python/pyspark/sql/tests.py
+++ b/python/pyspark/sql/tests.py
@@ -178,6 +178,11 @@ class DataTypeTests(unittest.TestCase):
         dt = DateType()
         self.assertEqual(dt.fromInternal(0), datetime.date(1970, 1, 1))
 
+    # regression test for SPARK-17035
+    def test_timestamp_microsecond(self):
+        tst = TimestampType()
+        self.assertEqual(tst.toInternal(datetime.datetime.max) % 1000000, 
999999)
+
     def test_empty_row(self):
         row = Row()
         self.assertEqual(len(row), 0)

http://git-wip-us.apache.org/repos/asf/spark/blob/12a89e55/python/pyspark/sql/types.py
----------------------------------------------------------------------
diff --git a/python/pyspark/sql/types.py b/python/pyspark/sql/types.py
index b765472..11b1e60 100644
--- a/python/pyspark/sql/types.py
+++ b/python/pyspark/sql/types.py
@@ -189,7 +189,7 @@ class TimestampType(AtomicType):
         if dt is not None:
             seconds = (calendar.timegm(dt.utctimetuple()) if dt.tzinfo
                        else time.mktime(dt.timetuple()))
-            return int(seconds * 1e6 + dt.microsecond)
+            return int(seconds) * 1000000 + dt.microsecond
 
     def fromInternal(self, ts):
         if ts is not None:


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-17035] [SQL] [PYSPARK] Improve Timestamp not to lose precision for all cases

Reply via email to