[
https://issues.apache.org/jira/browse/SPARK-19342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Felix Cheung updated SPARK-19342:
---------------------------------
Target Version/s: 2.1.1, 2.2.0
Fix Version/s: 2.2.0
2.1.1
> Datatype tImestamp is converted to numeric in collect method
> -------------------------------------------------------------
>
> Key: SPARK-19342
> URL: https://issues.apache.org/jira/browse/SPARK-19342
> Project: Spark
> Issue Type: Bug
> Components: SparkR
> Affects Versions: 2.1.0
> Reporter: Fangzhou Yang
> Fix For: 2.1.1, 2.2.0
>
>
> Get double instead of POSIX in collect method for timestamp column datatype,
> when NA exists at the top of the column.
> The following codes and outputs show that, how the bug can be reproduced:
> {code}
> > sparkR.session(master = "local")
> Spark package found in SPARK_HOME: /home/titicaca/spark-2.1
> Launching java with spark-submit command
> /home/titicaca/spark-2.1/bin/spark-submit sparkr-shell
> /tmp/RtmpqmpZUg/backend_port363a898be92
> Java ref type org.apache.spark.sql.SparkSession id 1
> > df <- data.frame(col1 = c(0, 1, 2),
> + col2 = c(as.POSIXct("2017-01-01 00:00:01"), NA,
> as.POSIXct("2017-01-01 12:00:01")))
> > sdf1 <- createDataFrame(df)
> > print(dtypes(sdf1))
> [[1]]
> [1] "col1" "double"
> [[2]]
> [1] "col2" "timestamp"
> > df1 <- collect(sdf1)
> > print(lapply(df1, class))
> $col1
> [1] "numeric"
> $col2
> [1] "POSIXct" "POSIXt"
> > sdf2 <- filter(sdf1, "col1 > 0")
> > print(dtypes(sdf2))
> [[1]]
> [1] "col1" "double"
> [[2]]
> [1] "col2" "timestamp"
> > df2 <- collect(sdf2)
> > print(lapply(df2, class))
> $col1
> [1] "numeric"
> $col2
> [1] "numeric"
> {code}
> As we can see, the data type of col2 is converted to numberic unexpectedly in
> the collected local data frame df2
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]