João Rafael created SPARK-26199:
-----------------------------------
Summary: Long expressions cause mutate to fail
Key: SPARK-26199
URL: https://issues.apache.org/jira/browse/SPARK-26199
Project: Spark
Issue Type: Bug
Components: SparkR
Affects Versions: 2.2.0
Reporter: João Rafael
Calling {{mutate(df, field = expr)}} fails when expr is very long.
Example:
{code:R}
df <- mutate(df, field = ifelse(
lit(TRUE),
lit("A"),
ifelse(
lit(T),
lit("BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB"),
lit("C")
)
))
{code}
Stack trace:
{code:R}
FATAL subscript out of bounds
at .handleSimpleError(function (obj)
{
level = sapply(class(obj), sw
at FUN(X[[i]], ...)
at lapply(seq_along(args), function(i) {
if (ns[[i]] != "") {
at lapply(seq_along(args), function(i) {
if (ns[[i]] != "") {
at mutate(df, field = ifelse(lit(TRUE), lit("A"), ifelse(lit(T), lit("BBB
at #78: mutate(df, field = ifelse(lit(TRUE), lit("A"), ifelse(lit(T
{code}
The root cause is in:
[DataFrame.R#LL2182|https://github.com/apache/spark/blob/master/R/pkg/R/DataFrame.R#L2182]
When the expression is long {{deparse}} returns multiple lines, causing
{{args}} to have more elements than {{ns}}. The solution could be to set
{{nlines = 1}} or to collapse the lines together.
A simple work around exists, by first placing the expression in a variable and
using it instead:
{code:R}
tmp <- ifelse(
lit(TRUE),
lit("A"),
ifelse(
lit(T),
lit("BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB"),
lit("C")
)
)
df <- mutate(df, field = tmp)
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]