[
https://issues.apache.org/jira/browse/SPARK-16785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427785#comment-15427785
]
Clark Fitzgerald commented on SPARK-16785:
------------------------------------------
Making some slow progress digging into this. Here's the failing test:
{code}
df_listcols <- data.frame(key = 1:3)
df_listcols$bytes <- lapply(df_listcols$key, serialize, connection = NULL)
df_listcols_spark <- createDataFrame(df_listcols)
result1 <- collect(df_listcols_spark)
expect_identical(df_listcols, result1)
result2 <- dapplyCollect(df_listcols_spark, function(x) x) # FAILS HERE
expect_equal(df_listcols, result2)
{code}
And the error message:
{code}
# R computation failed with
# Error in (function (..., row.names = NULL, check.rows = FALSE, check.names =
TRUE, :
# arguments imply differing number of rows: 3, 26
# at org.apache.spark.api.r.RRunner.compute(RRunner.scala:108)
{code}
A separate but related issue with array columns that [~shivaram] has mentioned
is that the dataframe can't be collected if this column is added:
{code}
df_listcols$arr <- lapply(df_listcols$key,
function(x) seq(0, 1, length.out=15))
{code}
Will continue looking at this on Monday.
> dapply doesn't return array or raw columns
> ------------------------------------------
>
> Key: SPARK-16785
> URL: https://issues.apache.org/jira/browse/SPARK-16785
> Project: Spark
> Issue Type: Bug
> Components: SparkR
> Affects Versions: 2.0.0
> Environment: Mac OS X
> Reporter: Clark Fitzgerald
> Priority: Minor
>
> Calling SparkR::dapplyCollect with R functions that return dataframes
> produces an error. This comes up when returning columns of binary data- ie.
> serialized fitted models. Also happens when functions return columns
> containing vectors.
> The error message:
> R computation failed with
> Error in (function (..., deparse.level = 1, make.row.names = TRUE,
> stringsAsFactors = default.stringsAsFactors()) :
> invalid list argument: all variables should have the same length
> Reproducible example:
> https://github.com/clarkfitzg/phd_research/blob/master/ddR/spark/sparkR_dapplyCollect7.R
> Relates to SPARK-16611
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]