[jira] [Updated] (SPARK-57726) Fix NPE in AttributeReference.hashCode when the attribute name is null

Max Gekk (Jira) Sat, 27 Jun 2026 05:40:14 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-57726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Max Gekk updated SPARK-57726:
-----------------------------
    Description: 
h2. Summary

{{AttributeReference.hashCode}} computes the name's contribution to the hash 
with a
direct {{name.hashCode()}} call, which throws a {{NullPointerException}} when 
the
attribute has a {{null}} name. {{AttributeReference.equals}} already compares 
the name
null-safely ({{name == ar.name}}), so {{hashCode}} is inconsistent with 
{{equals}} for
null-named attributes, and any use in a hash-based collection crashes.

h2. Affected code

{{org.apache.spark.sql.catalyst.expressions.AttributeReference.hashCode}} in
{{sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala}}:

{code:scala}
override def hashCode: Int = {
  // See http://stackoverflow.com/questions/113511/hash-code-implementation
  var h = 17
  h = h * 37 + name.hashCode()   // NPE if name == null
  h = h * 37 + dataType.hashCode()
  h = h * 37 + nullable.hashCode()
  h = h * 37 + metadata.hashCode()
  h = h * 37 + exprId.hashCode()
  h = h * 37 + qualifier.hashCode()
  h
}
{code}

h2. Reproduction (minimal, Catalyst level)

{code:scala}
import org.apache.spark.sql.catalyst.expressions.AttributeReference
import org.apache.spark.sql.types.IntegerType

val a = AttributeReference(null, IntegerType)()
Set(a)        // or a.hashCode(), a HashMap/HashSet, .distinct, .toSet, ...
{code}

Result:

{code}
java.lang.NullPointerException: Cannot invoke "Object.hashCode()" because
the return value of "...AttributeReference.name()" is null
  at 
org.apache.spark.sql.catalyst.expressions.AttributeReference.hashCode(namedExpressions.scala:...)
{code}

h2. How a null-named attribute arises

{{StructField}} permits a null name (no {{require(name != null)}}), and the 
name flows
unchanged through {{DataTypeUtils.toAttribute}} into {{AttributeReference}}. 
Such an
attribute can therefore reach hash-based collections during planning/analysis.

h2. Root cause

{{name.hashCode()}} is not null-safe, while {{equals}} is. This violates the
equals/hashCode contract for null-named attributes and turns a recoverable 
situation into
a hard {{NullPointerException}}.

h2. Proposed fix

Use {{java.util.Objects.hashCode(name)}} (already imported) instead of
{{name.hashCode()}}:

{code:scala}
h = h * 37 + Objects.hashCode(name)
{code}

A regression test in {{NamedExpressionSuite}} asserts that {{hashCode}} does 
not throw on
a null-named attribute and that the equals/hashCode contract holds.

h2. Related

Noticed during review of SPARK-57725 (NPE in {{AttributeSeq}} column resolution 
when an
attribute has a null name). The two issues are independent and are fixed in 
separate PRs.

> Fix NPE in AttributeReference.hashCode when the attribute name is null
> ----------------------------------------------------------------------
>
>                 Key: SPARK-57726
>                 URL: https://issues.apache.org/jira/browse/SPARK-57726
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 5.0.0
>            Reporter: Max Gekk
>            Priority: Major
>              Labels: pull-request-available
>
> h2. Summary
> {{AttributeReference.hashCode}} computes the name's contribution to the hash 
> with a
> direct {{name.hashCode()}} call, which throws a {{NullPointerException}} when 
> the
> attribute has a {{null}} name. {{AttributeReference.equals}} already compares 
> the name
> null-safely ({{name == ar.name}}), so {{hashCode}} is inconsistent with 
> {{equals}} for
> null-named attributes, and any use in a hash-based collection crashes.
> h2. Affected code
> {{org.apache.spark.sql.catalyst.expressions.AttributeReference.hashCode}} in
> {{sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala}}:
> {code:scala}
> override def hashCode: Int = {
>   // See http://stackoverflow.com/questions/113511/hash-code-implementation
>   var h = 17
>   h = h * 37 + name.hashCode()   // NPE if name == null
>   h = h * 37 + dataType.hashCode()
>   h = h * 37 + nullable.hashCode()
>   h = h * 37 + metadata.hashCode()
>   h = h * 37 + exprId.hashCode()
>   h = h * 37 + qualifier.hashCode()
>   h
> }
> {code}
> h2. Reproduction (minimal, Catalyst level)
> {code:scala}
> import org.apache.spark.sql.catalyst.expressions.AttributeReference
> import org.apache.spark.sql.types.IntegerType
> val a = AttributeReference(null, IntegerType)()
> Set(a)        // or a.hashCode(), a HashMap/HashSet, .distinct, .toSet, ...
> {code}
> Result:
> {code}
> java.lang.NullPointerException: Cannot invoke "Object.hashCode()" because
> the return value of "...AttributeReference.name()" is null
>   at 
> org.apache.spark.sql.catalyst.expressions.AttributeReference.hashCode(namedExpressions.scala:...)
> {code}
> h2. How a null-named attribute arises
> {{StructField}} permits a null name (no {{require(name != null)}}), and the 
> name flows
> unchanged through {{DataTypeUtils.toAttribute}} into {{AttributeReference}}. 
> Such an
> attribute can therefore reach hash-based collections during planning/analysis.
> h2. Root cause
> {{name.hashCode()}} is not null-safe, while {{equals}} is. This violates the
> equals/hashCode contract for null-named attributes and turns a recoverable 
> situation into
> a hard {{NullPointerException}}.
> h2. Proposed fix
> Use {{java.util.Objects.hashCode(name)}} (already imported) instead of
> {{name.hashCode()}}:
> {code:scala}
> h = h * 37 + Objects.hashCode(name)
> {code}
> A regression test in {{NamedExpressionSuite}} asserts that {{hashCode}} does 
> not throw on
> a null-named attribute and that the equals/hashCode contract holds.
> h2. Related
> Noticed during review of SPARK-57725 (NPE in {{AttributeSeq}} column 
> resolution when an
> attribute has a null name). The two issues are independent and are fixed in 
> separate PRs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-57726) Fix NPE in AttributeReference.hashCode when the attribute name is null

Reply via email to