[ https://issues.apache.org/jira/browse/SPARK-17781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563193#comment-15563193 ]
Hossein Falaki commented on SPARK-17781: ---------------------------------------- I investigated the issue. The root cause is that Date (and Timestamp) types convert to underlying representations when the are put in a list. To see it, do following simple test in an R REPL: {code} > l <- lapply(1:2, function(x) { Sys.Date() }) > print(paste("list values", l)) [1] "list values 17084" "list values 17084" {code} Similar problem happens with POSIXlt and POSIXct types. Therefore in {{worker.R}} when we call {{computeFunc(inputData)}} we are dealing with a list that contains double values for date fields. Right now it seems the safe way to work around it is avoiding Date and Time types and instead use String. [~shivaram] and [~felixcheung] do you have any ideas? > datetime is serialized as double inside dapply() > ------------------------------------------------ > > Key: SPARK-17781 > URL: https://issues.apache.org/jira/browse/SPARK-17781 > Project: Spark > Issue Type: Bug > Components: SparkR > Affects Versions: 2.0.0 > Reporter: Hossein Falaki > > When we ship a SparkDataFrame to workers for dapply family functions, inside > the worker DateTime objects are serialized as double. > To reproduce: > {code} > df <- createDataFrame(data.frame(id = 1:10, date = Sys.Date())) > dapplyCollect(df, function(x) { return(x$date) }) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org