MvR created ARROW-10916:
---------------------------
Summary: gapply fails executing with rbind error
Key: ARROW-10916
URL: https://issues.apache.org/jira/browse/ARROW-10916
Project: Apache Arrow
Issue Type: Bug
Components: R
Affects Versions: 2.0.0
Environment: Databricks runtime 7.3 LTS ML
Reporter: MvR
Attachments: Rerror.log
Executing following code on databricks runtime 7.3 LTS ML errors out showing
some rbind error whereas it is successfully executed without enabling Arrow in
Spark session. Full error message attached.
```
library(dplyr)
library(SparkR)
SparkR::sparkR.session(sparkConfig =
list(spark.sql.execution.arrow.sparkr.enabled = "true"))
mtcars %>%
SparkR::as.DataFrame() %>%
SparkR::gapply(x = .,
cols = c("cyl", "vs"),
func = function(key,
data){
dt <- data[,c("mpg", "qsec")]
res <- apply(dt, 2, mean)
df <- data.frame(firstGroupKey = key[1],
secondGroupKey = key[2],
mean_mpg = res[1],
mean_cyl = res[2])
return(df)
},
schema = structType(structField("cyl", "double"),
structField("vs", "double"),
structField("mpg_mean", "double"),
structField("qsec_mean", "double"))
) %>%
display()
```
--
This message was sent by Atlassian Jira
(v8.3.4#803005)