[jira] [Commented] (SPARK-16785) dapply doesn't return array or raw columns

Clark Fitzgerald (JIRA) Fri, 19 Aug 2016 00:56:44 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-16785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427785#comment-15427785
 ]


Clark Fitzgerald commented on SPARK-16785:
------------------------------------------

Making some slow progress digging into this. Here's the failing test:
{code}
  df_listcols <- data.frame(key = 1:3)
  df_listcols$bytes <- lapply(df_listcols$key, serialize, connection = NULL)
  df_listcols_spark <- createDataFrame(df_listcols)
  result1 <- collect(df_listcols_spark)
  expect_identical(df_listcols, result1)
  result2 <- dapplyCollect(df_listcols_spark, function(x) x)   # FAILS HERE
  expect_equal(df_listcols, result2)
{code}

And the error message:

{code}
# R computation failed with
# Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = 
TRUE,  :
#  arguments imply differing number of rows: 3, 26
#        at org.apache.spark.api.r.RRunner.compute(RRunner.scala:108)
{code}

A separate but related issue with array columns that [~shivaram] has mentioned 
is that the dataframe can't be collected if this column is added:

{code}
  df_listcols$arr <- lapply(df_listcols$key,
                            function(x) seq(0, 1, length.out=15))
{code}

Will continue looking at this on Monday.

> dapply doesn't return array or raw columns
> ------------------------------------------
>
>                 Key: SPARK-16785
>                 URL: https://issues.apache.org/jira/browse/SPARK-16785
>             Project: Spark
>          Issue Type: Bug
>          Components: SparkR
>    Affects Versions: 2.0.0
>         Environment: Mac OS X
>            Reporter: Clark Fitzgerald
>            Priority: Minor
>
> Calling SparkR::dapplyCollect with R functions that return dataframes 
> produces an error. This comes up when returning columns of binary data- ie. 
> serialized fitted models. Also happens when functions return columns 
> containing vectors.
> The error message:
> R computation failed with
>  Error in (function (..., deparse.level = 1, make.row.names = TRUE, 
> stringsAsFactors = default.stringsAsFactors())  :
>   invalid list argument: all variables should have the same length
> Reproducible example: 
> https://github.com/clarkfitzg/phd_research/blob/master/ddR/spark/sparkR_dapplyCollect7.R
> Relates to SPARK-16611



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-16785) dapply doesn't return array or raw columns

Reply via email to