[ 
https://issues.apache.org/jira/browse/SPARK-55681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Lee updated SPARK-55681:
----------------------------
    Description: 
When a singleton DataType (e.g., BinaryType, IntegerType) is deserialized by a 
framework that bypasses readResolve() — such as Kryo with certain 
configurations — a new instance of the class is created instead of returning 
the case object singleton. This non-singleton instance then fails pattern 
matching at every `case BinaryType =>` site in the codebase, because Scala case 
object matching relies on equals(), which defaults to reference equality.

The failure manifests as silent match fallthrough, leading to errors like:
{code:java}
IllegalStateException: The data type 'binary' is not supported in generating a 
writer function... {code}
 

The issue affects all 14 singleton DataType classes that use the `class + case 
object` pattern without an explicit equals() override:

 
{code:java}
BinaryType, BooleanType, ByteType, ShortType, IntegerType, LongType, FloatType, 
DoubleType, DateType, TimestampType, TimestampNTZType, NullType, 
CalendarIntervalType, VariantType {code}
 

Other DataTypes already handle this correctly:
 # VarcharType, CharType, TimeType, GeometryType, GeographyType — matched by 
type (`case _: XType =>`) rather than value
 # StringType — has custom equals() comparing collationId
 # DecimalType, ArrayType, MapType, StructType — case classes with 
auto-generated equals()

   *Proposed fix:* Override equals() and hashCode() on the 14 affected classes:
{code:java}
   override def equals(obj: Any): Boolean = obj.isInstanceOf[XType]
   override def hashCode(): Int = classOf[XType].getSimpleName.hashCode {code}
getSimpleName is used because Scala's auto-generated hashCode for 0-arity case 
objects returns productPrefix.hashCode (the simple class name). This preserves 
the original hash values, avoiding any behavioral change for code that depends 
on DataType hashCodes (e.g., HashMap/HashSet lookups, canonicalization 
ordering).

The change is strictly additive: singleton-to-singleton equality is unchanged; 
only the non-singleton edge case gains correct behavior. Although the 
constructors are private, this is a compile-time guard only – serialization 
frameworks bypass constructors at runtime.

  was:
When a singleton DataType (e.g., BinaryType, IntegerType) is deserialized by a 
framework that bypasses readResolve() — such as Kryo with certain 
configurations — a new instance of the class is created instead
    of returning the case object singleton. This non-singleton instance then 
fails pattern matching at every `case BinaryType =>` site in the codebase, 
because Scala case object matching relies on equals(),
   which defaults to reference equality.

   The failure manifests as silent match fallthrough, leading to errors like:
   \{code}
   IllegalStateException: The data type 'binary' is not supported in generating 
a writer function...
   \{code}

   The issue affects all 14 singleton DataType classes that use the `class + 
case object` pattern without an explicit equals() override:
   BinaryType, BooleanType, ByteType, ShortType, IntegerType, LongType, 
FloatType, DoubleType, DateType, TimestampType, TimestampNTZType, NullType, 
CalendarIntervalType, VariantType

   Other DataTypes already handle this correctly:
   - VarcharType, CharType, TimeType, GeometryType, GeographyType — matched by 
type (`case _: XType =>`) rather than value
   - StringType — has custom equals() comparing collationId
   - DecimalType, ArrayType, MapType, StructType — case classes with 
auto-generated equals()

   *Proposed fix:* Override equals() and hashCode() on the 14 affected classes:
   \{code:scala}
   override def equals(obj: Any): Boolean = obj.isInstanceOf[XType]
   override def hashCode(): Int = classOf[XType].getSimpleName.hashCode
   \{code}

   getSimpleName is used because Scala's auto-generated hashCode for 0-arity 
case objects returns productPrefix.hashCode (the simple class name). This 
preserves the original hash values, avoiding any behavioral
   change for code that depends on DataType hashCodes (e.g., HashMap/HashSet 
lookups, canonicalization ordering).

   The change is strictly additive: singleton-to-singleton equality is 
unchanged; only the non-singleton edge case gains correct behavior. Although 
the constructors are private, this is a compile-time guard only
    — serialization frameworks bypass constructors at runtime.


> Singleton DataType equality fails after deserialization
> -------------------------------------------------------
>
>                 Key: SPARK-55681
>                 URL: https://issues.apache.org/jira/browse/SPARK-55681
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 4.1.1
>            Reporter: Tim Lee
>            Priority: Major
>
> When a singleton DataType (e.g., BinaryType, IntegerType) is deserialized by 
> a framework that bypasses readResolve() — such as Kryo with certain 
> configurations — a new instance of the class is created instead of returning 
> the case object singleton. This non-singleton instance then fails pattern 
> matching at every `case BinaryType =>` site in the codebase, because Scala 
> case object matching relies on equals(), which defaults to reference equality.
> The failure manifests as silent match fallthrough, leading to errors like:
> {code:java}
> IllegalStateException: The data type 'binary' is not supported in generating 
> a writer function... {code}
>  
> The issue affects all 14 singleton DataType classes that use the `class + 
> case object` pattern without an explicit equals() override:
>  
> {code:java}
> BinaryType, BooleanType, ByteType, ShortType, IntegerType, LongType, 
> FloatType, DoubleType, DateType, TimestampType, TimestampNTZType, NullType, 
> CalendarIntervalType, VariantType {code}
>  
> Other DataTypes already handle this correctly:
>  # VarcharType, CharType, TimeType, GeometryType, GeographyType — matched by 
> type (`case _: XType =>`) rather than value
>  # StringType — has custom equals() comparing collationId
>  # DecimalType, ArrayType, MapType, StructType — case classes with 
> auto-generated equals()
>    *Proposed fix:* Override equals() and hashCode() on the 14 affected 
> classes:
> {code:java}
>    override def equals(obj: Any): Boolean = obj.isInstanceOf[XType]
>    override def hashCode(): Int = classOf[XType].getSimpleName.hashCode {code}
> getSimpleName is used because Scala's auto-generated hashCode for 0-arity 
> case objects returns productPrefix.hashCode (the simple class name). This 
> preserves the original hash values, avoiding any behavioral change for code 
> that depends on DataType hashCodes (e.g., HashMap/HashSet lookups, 
> canonicalization ordering).
> The change is strictly additive: singleton-to-singleton equality is 
> unchanged; only the non-singleton edge case gains correct behavior. Although 
> the constructors are private, this is a compile-time guard only – 
> serialization frameworks bypass constructors at runtime.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to