Javier Luraschi created SPARK-25902:
---------------------------------------

             Summary: Support for dates with milliseconds in Arrow bindings
                 Key: SPARK-25902
                 URL: https://issues.apache.org/jira/browse/SPARK-25902
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.3.2
            Reporter: Javier Luraschi


Currently, the Apache Arrow bindings for Java only support `Date` with the 
metric set to `DateUnit.DAY`, see 
[ArrowUtils.scala#L72|https://github.com/apache/spark/blob/8c2edf46d0f89e5ec54968218d89f30a3f8190bc/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowUtils.scala#L72].

However, the Spark Arrow bindings for `R` are adding support to map `POSIXct` 
to `Date` using `DateUnit.MILLISECOND`, with the following code triggering the 
following warning:

 
{code:java}
devtools::install_github("apache/arrow", subdir = "r")
devtools::install_github("rstudio/sparklyr", ref = "feature/arrow")
Sys.setenv("SPARK_HOME_VERSION" = "2.3.2")

library(sparklyr)
library(arrow)
sc <- spark_connect(master = "local", spark_home = "<path-to-spark-sources>")

dates <- data.frame(dates = c(
 as.POSIXlt(Sys.time(), "GMT"),
 as.POSIXlt(Sys.time(), "EST"))
)

dates_tbl <- sdf_copy_to(sc, dates, overwrite = T){code}
 

 
{code:java}
Arrow disabled due to columns: dates
{code}
Which means that Arrow serialization gets disabled due to the following Spark 
exception being thrown:
{code:java}
java.lang.UnsupportedOperationException: Unsupported data type: 
Date(MILLISECOND)
{code}
 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to