Tim Lee created SPARK-55681:
-------------------------------

             Summary: Singleton DataType equality fails after deserialization
                 Key: SPARK-55681
                 URL: https://issues.apache.org/jira/browse/SPARK-55681
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 4.1.1
            Reporter: Tim Lee


When a singleton DataType (e.g., BinaryType, IntegerType) is deserialized by a 
framework that bypasses readResolve() — such as Kryo with certain 
configurations — a new instance of the class is created instead
    of returning the case object singleton. This non-singleton instance then 
fails pattern matching at every `case BinaryType =>` site in the codebase, 
because Scala case object matching relies on equals(),
   which defaults to reference equality.

   The failure manifests as silent match fallthrough, leading to errors like:
   \{code}
   IllegalStateException: The data type 'binary' is not supported in generating 
a writer function...
   \{code}

   The issue affects all 14 singleton DataType classes that use the `class + 
case object` pattern without an explicit equals() override:
   BinaryType, BooleanType, ByteType, ShortType, IntegerType, LongType, 
FloatType, DoubleType, DateType, TimestampType, TimestampNTZType, NullType, 
CalendarIntervalType, VariantType

   Other DataTypes already handle this correctly:
   - VarcharType, CharType, TimeType, GeometryType, GeographyType — matched by 
type (`case _: XType =>`) rather than value
   - StringType — has custom equals() comparing collationId
   - DecimalType, ArrayType, MapType, StructType — case classes with 
auto-generated equals()

   *Proposed fix:* Override equals() and hashCode() on the 14 affected classes:
   \{code:scala}
   override def equals(obj: Any): Boolean = obj.isInstanceOf[XType]
   override def hashCode(): Int = classOf[XType].getSimpleName.hashCode
   \{code}

   getSimpleName is used because Scala's auto-generated hashCode for 0-arity 
case objects returns productPrefix.hashCode (the simple class name). This 
preserves the original hash values, avoiding any behavioral
   change for code that depends on DataType hashCodes (e.g., HashMap/HashSet 
lookups, canonicalization ordering).

   The change is strictly additive: singleton-to-singleton equality is 
unchanged; only the non-singleton edge case gains correct behavior. Although 
the constructors are private, this is a compile-time guard only
    — serialization frameworks bypass constructors at runtime.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to