Vivek Atal created SPARK-42005:
----------------------------------
Summary: SparkR cannot collect dataframe with NA in a date column
along with another timestamp column
Key: SPARK-42005
URL: https://issues.apache.org/jira/browse/SPARK-42005
Project: Spark
Issue Type: Bug
Components: SparkR
Affects Versions: 3.3.0
Reporter: Vivek Atal
This issue seems to be related with
https://issues.apache.org/jira/browse/SPARK-17811, which was resolved by
[https://github.com/apache/spark/pull/15421] .
If there exists a column of data type `date` which is completely NA, and
another column of data type `timestamp`, the SparkR cannot collect that Spark
dataframe into R dataframe.
The reproducible code snippet is below.
{code:java}
df <- data.frame(x = as.Date(NA), y = as.POSIXct("2022-01-01"))
SparkR::collect(SparkR::createDataFrame(df))
#> Error in handleErrors(returnStatus, conn): org.apache.spark.SparkException:
Job aborted due to stage failure: Task 0 in stage 25.0 failed 1 times, most
recent failure: Lost task 0.0 in stage 25.0 (TID 25)
(ip-10-172-210-194.us-west-2.compute.internal executor driver):
java.lang.IllegalArgumentException: Invalid type N
#> at org.apache.spark.api.r.SerDe$.readTypedObject(SerDe.scala:94)
#> at org.apache.spark.api.r.SerDe$.readObject(SerDe.scala:68)
#> at #>
org.apache.spark.sql.api.r.SQLUtils$.$anonfun$bytesToRow$1(SQLUtils.scala:129)
#> at
org.apache.spark.sql.api.r.SQLUtils$.$anonfun$bytesToRow$1$adapted(SQLUtils.scala:128)
#> at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
#> at scala.collection.immutable.Range.foreach(Range.scala:158)
#> ...{code}
This issue does not appear If the column of `date` data type is {_}not
missing{_}. Or if there _does not exist_ any other column with data type as
`timestamp`.
{code:java}
df <- data.frame(x = as.Date("2022-01-01"), y = as.POSIXct("2022-01-01"))
SparkR::collect(SparkR::createDataFrame(df))
#> x y
#> 1 2022-01-01 2022-01-01{code}
or
{code:java}
df <- data.frame(x = as.Date(NA), y = as.character("2022-01-01"))
SparkR::collect(SparkR::createDataFrame(df))
#> x y
#> 1 <NA> 2022-01-01{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]