Frank Oosterhuis created SPARK-33632:
----------------------------------------
Summary: to_date doesn't behave as documented
Key: SPARK-33632
URL: https://issues.apache.org/jira/browse/SPARK-33632
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 3.0.1
Reporter: Frank Oosterhuis
I'm trying to use to_date on a string formatted as "10/31/20".
Expected output is "2020-10-31".
Actual output is "0020-01-31".
The
[documentation|https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html]
suggests 2020 or 20 as input for "y".
Example below. Expected behaviour is included in the udf.
{code:scala}
import java.sql.Date
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions.{to_date, udf}
object ToDate {
val toDate = udf((date: String) => {
val split = date.split("/")
val month = "%02d".format(split(0).toInt)
val day = "%02d".format(split(1).toInt)
val year = split(2).toInt + 2000
Date.valueOf(s"${year}-${month}-${day}")
})
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder().master("local[2]").getOrCreate()
spark.sparkContext.setLogLevel("ERROR")
import spark.implicits._
Seq("1/1/20", "10/31/20")
.toDF("raw")
.withColumn("to_date", to_date($"raw", "m/d/y"))
.withColumn("udf", toDate($"raw"))
.show
}
}
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]