[ https://issues.apache.org/jira/browse/SPARK-27799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-27799: ---------------------------------- Affects Version/s: (was: 3.0.0) 3.1.0 > Allow SerializerManager.canUseKryo whitelist to be extended via a > configuration > ------------------------------------------------------------------------------- > > Key: SPARK-27799 > URL: https://issues.apache.org/jira/browse/SPARK-27799 > Project: Spark > Issue Type: New Feature > Components: Spark Core > Affects Versions: 3.1.0 > Reporter: Josh Rosen > Priority: Major > > Kryo serialization can offer a substantial performance boost compared to Java > serialization and I generally recommend that users configure Spark to use it. > That said, in general it may not be safe to _blindly_ flip the default to > Kryo: certain jobs might depend on Java serialization, so switching them to > Kryo might cause crashes or incorrect behavior. > However, we may know that certain data types are safe to serialize with Kryo, > in which case we can whitelist _just those types_ for use with Kryo > serialization but keep everything else using the default Java serializer. > Back in SPARK-13926 (Spark 2.0) I added a {{SerializerManager}} to implement > this idea for strings, primitives, primitive arrays, and a few other data > types: those types will automatically use Kryo serialization when used as > top-level types in RDDs. However, there's no ability for users to customize / > extend this whitelist. > I propose to add a new user-facing configuration, name TBD, which accepts a > comma-separated list of class / interface names and uses them to expand the > {{SerializerMananger.canUseKryo}} whitelist. > This will allow advanced users to incrementally default to Kryo for certain > types (e.g. Scrooge ThriftStructs). > This feature is useful for "data platform" teams who provide > Spark-as-a-service to internal customers: with this proposed configuration, > platform teams can configure global defaults for serialization in a way which > is more incremental / narrow-in-scope than simply defaulting to Kryo > everywhere. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org