João Rafael created SPARK-26199:
-----------------------------------

             Summary: Long expressions cause mutate to fail
                 Key: SPARK-26199
                 URL: https://issues.apache.org/jira/browse/SPARK-26199
             Project: Spark
          Issue Type: Bug
          Components: SparkR
    Affects Versions: 2.2.0
            Reporter: João Rafael


Calling {{mutate(df, field = expr)}} fails when expr is very long.

Example:

{code:R}
df <- mutate(df, field = ifelse(
    lit(TRUE),
    lit("A"),
    ifelse(
        lit(T),
        lit("BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB"),
        lit("C")
    )
))
{code}

Stack trace:

{code:R}
FATAL subscript out of bounds
  at .handleSimpleError(function (obj) 
{
    level = sapply(class(obj), sw
  at FUN(X[[i]], ...)
  at lapply(seq_along(args), function(i) {
    if (ns[[i]] != "") {

at lapply(seq_along(args), function(i) {
    if (ns[[i]] != "") {

at mutate(df, field = ifelse(lit(TRUE), lit("A"), ifelse(lit(T), lit("BBB
  at #78: mutate(df, field = ifelse(lit(TRUE), lit("A"), ifelse(lit(T
{code}

The root cause is in: 
[DataFrame.R#LL2182|https://github.com/apache/spark/blob/master/R/pkg/R/DataFrame.R#L2182]

When the expression is long {{deparse}} returns multiple lines, causing 
{{args}} to have more elements than {{ns}}. The solution could be to set 
{{nlines = 1}} or to collapse the lines together.

A simple work around exists, by first placing the expression in a variable and 
using it instead:

{code:R}
tmp <- ifelse(
    lit(TRUE),
    lit("A"),
    ifelse(
        lit(T),
        lit("BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB"),
        lit("C")
    )
)
df <- mutate(df, field = tmp)
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to