Frank Oosterhuis created SPARK-33632:
----------------------------------------

             Summary: to_date doesn't behave as documented
                 Key: SPARK-33632
                 URL: https://issues.apache.org/jira/browse/SPARK-33632
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 3.0.1
            Reporter: Frank Oosterhuis


I'm trying to use to_date on a string formatted as "10/31/20".
Expected output is "2020-10-31".
Actual output is "0020-01-31".

The 
[documentation|https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html]
 suggests 2020 or 20 as input for "y".

Example below. Expected behaviour is included in the udf.

{code:scala}
import java.sql.Date

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions.{to_date, udf}

object ToDate {
  val toDate = udf((date: String) => {
    val split = date.split("/")
    val month = "%02d".format(split(0).toInt)
    val day = "%02d".format(split(1).toInt)
    val year = split(2).toInt + 2000

    Date.valueOf(s"${year}-${month}-${day}")
  })

  def main(args: Array[String]): Unit = {
    val spark = SparkSession.builder().master("local[2]").getOrCreate()
    spark.sparkContext.setLogLevel("ERROR")
    import spark.implicits._

    Seq("1/1/20", "10/31/20")
      .toDF("raw")
      .withColumn("to_date", to_date($"raw", "m/d/y"))
      .withColumn("udf", toDate($"raw"))
      .show
  }
}

{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to