Killian created SPARK-35123:
-------------------------------

             Summary: read partitionned parquet: my_col=NOW replaced by 
<current_date> on read()
                 Key: SPARK-35123
                 URL: https://issues.apache.org/jira/browse/SPARK-35123
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 3.1.1
            Reporter: Killian


When reading parquet file partitioned with a column containing the value "NOW", 
The value is interpreted as now() and replaced by the current time at the 
moment of the read() funct is executed
{code:java}
// df = spark.createDataFrame(data=["NOW", "TEST"], schema = ["col1"])
df.write.partitionBy("col1").parquet("/home/test/test.parquet")
>>> /home/test/test.parquet/col1=NOW
df_loaded = spark.read.option(
 "basePath",
 "/home/test/test.parquet",
).parquet("/home/test/test.parquet/col1=*")
>>> 
+--------------------------+
|col1                      |
+--------------------------+
|2021-04-18 12:49:13.590431|
|TEST                      |{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to