[ 
https://issues.apache.org/jira/browse/SPARK-17781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563193#comment-15563193
 ] 

Hossein Falaki commented on SPARK-17781:
----------------------------------------

I investigated the issue. The root cause is that Date (and Timestamp) types 
convert to underlying representations when the are put in a list. To see it, do 
following simple test in an R REPL:

{code}
> l <- lapply(1:2, function(x) { Sys.Date() })
> print(paste("list values", l))
[1] "list values 17084" "list values 17084"
{code}

Similar problem happens with POSIXlt and POSIXct types. Therefore in 
{{worker.R}} when we call {{computeFunc(inputData)}} we are dealing with a list 
that contains double values for date fields. 

Right now it seems the safe way to work around it is avoiding Date and Time 
types and instead use String. [~shivaram] and [~felixcheung] do you have any 
ideas?

> datetime is serialized as double inside dapply()
> ------------------------------------------------
>
>                 Key: SPARK-17781
>                 URL: https://issues.apache.org/jira/browse/SPARK-17781
>             Project: Spark
>          Issue Type: Bug
>          Components: SparkR
>    Affects Versions: 2.0.0
>            Reporter: Hossein Falaki
>
> When we ship a SparkDataFrame to workers for dapply family functions, inside 
> the worker DateTime objects are serialized as double.
> To reproduce:
> {code}
> df <- createDataFrame(data.frame(id = 1:10, date = Sys.Date()))
> dapplyCollect(df, function(x) { return(x$date) })
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to