Repository: spark Updated Branches: refs/heads/master 6f0988b12 -> 12a89e55c
[SPARK-17035] [SQL] [PYSPARK] Improve Timestamp not to lose precision for all cases ## What changes were proposed in this pull request? `PySpark` loses `microsecond` precision for some corner cases during converting `Timestamp` into `Long`. For example, for the following `datetime.max` value should be converted a value whose last 6 digits are '999999'. This PR improves the logic not to lose precision for all cases. **Corner case** ```python >>> datetime.datetime.max datetime.datetime(9999, 12, 31, 23, 59, 59, 999999) ``` **Before** ```python >>> from datetime import datetime >>> from pyspark.sql import Row >>> from pyspark.sql.types import StructType, StructField, TimestampType >>> schema = StructType([StructField("dt", TimestampType(), False)]) >>> [schema.toInternal(row) for row in [{"dt": datetime.max}]] [(253402329600000000,)] ``` **After** ```python >>> [schema.toInternal(row) for row in [{"dt": datetime.max}]] [(253402329599999999,)] ``` ## How was this patch tested? Pass the Jenkins test with a new test case. Author: Dongjoon Hyun <dongj...@apache.org> Closes #14631 from dongjoon-hyun/SPARK-17035. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/12a89e55 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/12a89e55 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/12a89e55 Branch: refs/heads/master Commit: 12a89e55cbd630fa2986da984e066cd07d3bf1f7 Parents: 6f0988b Author: Dongjoon Hyun <dongj...@apache.org> Authored: Tue Aug 16 10:01:30 2016 -0700 Committer: Davies Liu <davies....@gmail.com> Committed: Tue Aug 16 10:01:30 2016 -0700 ---------------------------------------------------------------------- python/pyspark/sql/tests.py | 5 +++++ python/pyspark/sql/types.py | 2 +- 2 files changed, 6 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/12a89e55/python/pyspark/sql/tests.py ---------------------------------------------------------------------- diff --git a/python/pyspark/sql/tests.py b/python/pyspark/sql/tests.py index 520b09d..fc41701 100644 --- a/python/pyspark/sql/tests.py +++ b/python/pyspark/sql/tests.py @@ -178,6 +178,11 @@ class DataTypeTests(unittest.TestCase): dt = DateType() self.assertEqual(dt.fromInternal(0), datetime.date(1970, 1, 1)) + # regression test for SPARK-17035 + def test_timestamp_microsecond(self): + tst = TimestampType() + self.assertEqual(tst.toInternal(datetime.datetime.max) % 1000000, 999999) + def test_empty_row(self): row = Row() self.assertEqual(len(row), 0) http://git-wip-us.apache.org/repos/asf/spark/blob/12a89e55/python/pyspark/sql/types.py ---------------------------------------------------------------------- diff --git a/python/pyspark/sql/types.py b/python/pyspark/sql/types.py index b765472..11b1e60 100644 --- a/python/pyspark/sql/types.py +++ b/python/pyspark/sql/types.py @@ -189,7 +189,7 @@ class TimestampType(AtomicType): if dt is not None: seconds = (calendar.timegm(dt.utctimetuple()) if dt.tzinfo else time.mktime(dt.timetuple())) - return int(seconds * 1e6 + dt.microsecond) + return int(seconds) * 1000000 + dt.microsecond def fromInternal(self, ts): if ts is not None: --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org