Chris Martin created SPARK-34259:
------------------------------------
Summary: Reading a partitioned dataset with a partition value of
NOW causes the value to be parsed as a timestamp.
Key: SPARK-34259
URL: https://issues.apache.org/jira/browse/SPARK-34259
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 3.0.1
Reporter: Chris Martin
*Problem*
Reading a partitioned dataset where one of the column values matches a special
timestamp (NOW, TODAY etc) causes the value to be interpreted as a timestamp
rather than a string.
*Example Code (Scala)*
{code:java}
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
object TestBug {
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder().master("local[*]").getOrCreate()
val df = spark.range(1, 2).withColumn("partition", lit("NOW"))
df.write.mode("overwrite").partitionBy("partition").parquet("bug")
spark.read.parquet("bug").show(truncate = false)
}
}
{code}
The above program prints out:
{noformat}
+---+--------------------------+
|id |partition |
+---+--------------------------+
|1 |2021-01-27 08:53:23.650039|
+---+--------------------------+
{noformat}
*Analysis*
This happens because in PartitioningUtils.inferPartitionColumnValue we try and
cast the partition value to a timestamp in order to determine if timestamp is a
valid interpretation. As NOW etc are literals which are valid to cast to
timestamps, the code ends up as interpreting the value as a timestamp.
I think what we want to do here is change
PartitioningUtils.inferPartitionColumnValue so that when it attempts to
interpret as timestamp we ignore the special values. This looks difficult to do
if we continue to use cast, but one other option is to add an option to
DateTimeUtils.stringToDate to tell it to ignore special values and instead use
that to do the conversion.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]