Bruce Robbins created SPARK-42384: ------------------------------------- Summary: Mask function's generated code does not handle null input Key: SPARK-42384 URL: https://issues.apache.org/jira/browse/SPARK-42384 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.0 Reporter: Bruce Robbins
Example: {noformat} create or replace temp view v1 as select * from values (null), ('AbCD123-@$#') as data(col1); cache table v1; select mask(col1) from v1; {noformat} This query results in a {{NullPointerException}}: {noformat} 23/02/07 16:36:06 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3) java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760) {noformat} The generated code calls {{UnsafeWriter.write(0, value_0)}} regardless of whether {{Mask.transformInput}} returns null or not. The {{UnsafeWriter.write}} method for {{UTF8String}} does not expect a null pointer. {noformat} /* 031 */ boolean isNull_1 = i.isNullAt(0); /* 032 */ UTF8String value_1 = isNull_1 ? /* 033 */ null : (i.getUTF8String(0)); /* 034 */ /* 035 */ /* 036 */ /* 037 */ /* 038 */ UTF8String value_0 = null; /* 039 */ value_0 = org.apache.spark.sql.catalyst.expressions.Mask.transformInput(value_1, ((UTF8String) references[0] /* literal */), ((UTF8String) references[1] /* literal */), ((UTF8String) references[2] /* literal */), ((UTF8String) references[3] /* literal */));; /* 040 */ if (false) { /* 041 */ mutableStateArray_0[0].setNullAt(0); /* 042 */ } else { /* 043 */ mutableStateArray_0[0].write(0, value_0); /* 044 */ } /* 045 */ return (mutableStateArray_0[0].getRow()); /* 046 */ } {noformat} The bug is not exercised by a literal null input value, since there appears to be some optimization that simply replaces the entire function call with a null literal: {noformat} spark-sql> explain SELECT mask(NULL); == Physical Plan == *(1) Project [null AS mask(NULL, X, x, n, NULL)#47] +- *(1) Scan OneRowRelation[] Time taken: 0.026 seconds, Fetched 1 row(s) spark-sql> SELECT mask(NULL); NULL Time taken: 0.042 seconds, Fetched 1 row(s) spark-sql> {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org