Gaelan Mines created SPARK-34292:
------------------------------------

             Summary: NOW is interpreted as the NOW SQL function
                 Key: SPARK-34292
                 URL: https://issues.apache.org/jira/browse/SPARK-34292
             Project: Spark
          Issue Type: Bug
          Components: PySpark, Spark Core
    Affects Versions: 3.0.0
            Reporter: Gaelan Mines


I think we ran into a bug in the Spark framework. Basically, the bug we caught 
is like this: when reading a data frame in Parquet format partitioned by a 
column, if the column contains values of “NOW”, NOW will be interpreted as the 
NOW function as in SQL, and returns the literal timestamp of NOW.

 

Steps to reproduce:

from pyspark.sql.session import SparkSession

spark = SparkSession.builder.getOrCreate()

df = spark.createDataFrame([['NOW', 1], ['THEN', 2]], schema=['Col1', 'Col2'])

df.write.parquet('/tmp/my_partitioned_data', mode='overwrite', 
partitionBy=['Col1'])

df_read_back = spark.read.parquet('/tmp/my_partitioned_data')

"""
In [1]: df.show()
+----+----+
|Col1|Col2|
+----+----+
| NOW| 1|
|THEN| 2|
+----+----+


In [2]: df_read_back.show()
+----+--------------------+
|Col2| Col1|
+----+--------------------+
| 1|2021-01-22 10:46:...|
| 2| THEN|
+----+--------------------+



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to