Tim Lee created SPARK-55681:
-------------------------------
Summary: Singleton DataType equality fails after deserialization
Key: SPARK-55681
URL: https://issues.apache.org/jira/browse/SPARK-55681
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 4.1.1
Reporter: Tim Lee
When a singleton DataType (e.g., BinaryType, IntegerType) is deserialized by a
framework that bypasses readResolve() — such as Kryo with certain
configurations — a new instance of the class is created instead
of returning the case object singleton. This non-singleton instance then
fails pattern matching at every `case BinaryType =>` site in the codebase,
because Scala case object matching relies on equals(),
which defaults to reference equality.
The failure manifests as silent match fallthrough, leading to errors like:
\{code}
IllegalStateException: The data type 'binary' is not supported in generating
a writer function...
\{code}
The issue affects all 14 singleton DataType classes that use the `class +
case object` pattern without an explicit equals() override:
BinaryType, BooleanType, ByteType, ShortType, IntegerType, LongType,
FloatType, DoubleType, DateType, TimestampType, TimestampNTZType, NullType,
CalendarIntervalType, VariantType
Other DataTypes already handle this correctly:
- VarcharType, CharType, TimeType, GeometryType, GeographyType — matched by
type (`case _: XType =>`) rather than value
- StringType — has custom equals() comparing collationId
- DecimalType, ArrayType, MapType, StructType — case classes with
auto-generated equals()
*Proposed fix:* Override equals() and hashCode() on the 14 affected classes:
\{code:scala}
override def equals(obj: Any): Boolean = obj.isInstanceOf[XType]
override def hashCode(): Int = classOf[XType].getSimpleName.hashCode
\{code}
getSimpleName is used because Scala's auto-generated hashCode for 0-arity
case objects returns productPrefix.hashCode (the simple class name). This
preserves the original hash values, avoiding any behavioral
change for code that depends on DataType hashCodes (e.g., HashMap/HashSet
lookups, canonicalization ordering).
The change is strictly additive: singleton-to-singleton equality is
unchanged; only the non-singleton edge case gains correct behavior. Although
the constructors are private, this is a compile-time guard only
— serialization frameworks bypass constructors at runtime.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]