Gaelan Mines created SPARK-34292:
------------------------------------
Summary: NOW is interpreted as the NOW SQL function
Key: SPARK-34292
URL: https://issues.apache.org/jira/browse/SPARK-34292
Project: Spark
Issue Type: Bug
Components: PySpark, Spark Core
Affects Versions: 3.0.0
Reporter: Gaelan Mines
I think we ran into a bug in the Spark framework. Basically, the bug we caught
is like this: when reading a data frame in Parquet format partitioned by a
column, if the column contains values of “NOW”, NOW will be interpreted as the
NOW function as in SQL, and returns the literal timestamp of NOW.
Steps to reproduce:
from pyspark.sql.session import SparkSession
spark = SparkSession.builder.getOrCreate()
df = spark.createDataFrame([['NOW', 1], ['THEN', 2]], schema=['Col1', 'Col2'])
df.write.parquet('/tmp/my_partitioned_data', mode='overwrite',
partitionBy=['Col1'])
df_read_back = spark.read.parquet('/tmp/my_partitioned_data')
"""
In [1]: df.show()
+----+----+
|Col1|Col2|
+----+----+
| NOW| 1|
|THEN| 2|
+----+----+
In [2]: df_read_back.show()
+----+--------------------+
|Col2| Col1|
+----+--------------------+
| 1|2021-01-22 10:46:...|
| 2| THEN|
+----+--------------------+
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]