Bruce Robbins created SPARK-42384:
-------------------------------------

             Summary: Mask function's generated code does not handle null input
                 Key: SPARK-42384
                 URL: https://issues.apache.org/jira/browse/SPARK-42384
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.5.0
            Reporter: Bruce Robbins


Example:
{noformat}
create or replace temp view v1 as
select * from values
(null),
('AbCD123-@$#')
as data(col1);

cache table v1;

select mask(col1) from v1;
{noformat}
This query results in a {{NullPointerException}}:
{noformat}
23/02/07 16:36:06 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3)
java.lang.NullPointerException
        at 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
        at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
{noformat}
The generated code calls {{UnsafeWriter.write(0, value_0)}} regardless of 
whether {{Mask.transformInput}} returns null or not. The {{UnsafeWriter.write}} 
method for {{UTF8String}} does not expect a null pointer.
{noformat}
/* 031 */     boolean isNull_1 = i.isNullAt(0);
/* 032 */     UTF8String value_1 = isNull_1 ?
/* 033 */     null : (i.getUTF8String(0));
/* 034 */
/* 035 */
/* 036 */
/* 037 */
/* 038 */     UTF8String value_0 = null;
/* 039 */     value_0 = 
org.apache.spark.sql.catalyst.expressions.Mask.transformInput(value_1, 
((UTF8String) references[0] /* literal */), ((UTF8String) references[1] /* 
literal */), ((UTF8String) references[2] /* literal */), ((UTF8String) 
references[3] /* literal */));;
/* 040 */     if (false) {
/* 041 */       mutableStateArray_0[0].setNullAt(0);
/* 042 */     } else {
/* 043 */       mutableStateArray_0[0].write(0, value_0);
/* 044 */     }
/* 045 */     return (mutableStateArray_0[0].getRow());
/* 046 */   }
{noformat}

The bug is not exercised by a literal null input value, since there appears to 
be some optimization that simply replaces the entire function call with a null 
literal:
{noformat}
spark-sql> explain SELECT mask(NULL);
== Physical Plan ==
*(1) Project [null AS mask(NULL, X, x, n, NULL)#47]
+- *(1) Scan OneRowRelation[]

Time taken: 0.026 seconds, Fetched 1 row(s)
spark-sql> SELECT mask(NULL);
NULL
Time taken: 0.042 seconds, Fetched 1 row(s)
spark-sql> 
{noformat}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to