Chris Martin created SPARK-34259:
------------------------------------

             Summary: Reading a partitioned dataset with a partition value of 
NOW causes the value to be parsed as a timestamp.
                 Key: SPARK-34259
                 URL: https://issues.apache.org/jira/browse/SPARK-34259
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.0.1
            Reporter: Chris Martin


*Problem*

Reading a partitioned dataset where one of the column values matches a special 
timestamp (NOW, TODAY etc) causes the value to be interpreted as a timestamp 
rather than a string. 

*Example Code (Scala)*
{code:java}
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._

object TestBug {

  def main(args: Array[String]): Unit = {
    val spark = SparkSession.builder().master("local[*]").getOrCreate()

    val df = spark.range(1, 2).withColumn("partition", lit("NOW"))
    df.write.mode("overwrite").partitionBy("partition").parquet("bug")
    
    spark.read.parquet("bug").show(truncate = false)
  }

}
{code}
 The above program prints out:
{noformat}
+---+--------------------------+
|id |partition |
+---+--------------------------+
|1 |2021-01-27 08:53:23.650039|
+---+--------------------------+
{noformat}
 

*Analysis*

This happens because in PartitioningUtils.inferPartitionColumnValue we try and 
cast the partition value to a timestamp in order to determine if timestamp is a 
valid interpretation.  As NOW etc are literals which are valid to cast to 
timestamps, the code ends up as interpreting the value as a timestamp.

I think what we want to do here is change 
PartitioningUtils.inferPartitionColumnValue so that when it  attempts to 
interpret as timestamp we ignore the special values. This looks difficult to do 
if we continue to use cast, but one other option is to add an option to
DateTimeUtils.stringToDate to tell it to ignore special values and instead use 
that to do the conversion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to