[
https://issues.apache.org/jira/browse/SPARK-27799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun updated SPARK-27799:
----------------------------------
Affects Version/s: (was: 3.0.0)
3.1.0
> Allow SerializerManager.canUseKryo whitelist to be extended via a
> configuration
> -------------------------------------------------------------------------------
>
> Key: SPARK-27799
> URL: https://issues.apache.org/jira/browse/SPARK-27799
> Project: Spark
> Issue Type: New Feature
> Components: Spark Core
> Affects Versions: 3.1.0
> Reporter: Josh Rosen
> Priority: Major
>
> Kryo serialization can offer a substantial performance boost compared to Java
> serialization and I generally recommend that users configure Spark to use it.
> That said, in general it may not be safe to _blindly_ flip the default to
> Kryo: certain jobs might depend on Java serialization, so switching them to
> Kryo might cause crashes or incorrect behavior.
> However, we may know that certain data types are safe to serialize with Kryo,
> in which case we can whitelist _just those types_ for use with Kryo
> serialization but keep everything else using the default Java serializer.
> Back in SPARK-13926 (Spark 2.0) I added a {{SerializerManager}} to implement
> this idea for strings, primitives, primitive arrays, and a few other data
> types: those types will automatically use Kryo serialization when used as
> top-level types in RDDs. However, there's no ability for users to customize /
> extend this whitelist.
> I propose to add a new user-facing configuration, name TBD, which accepts a
> comma-separated list of class / interface names and uses them to expand the
> {{SerializerMananger.canUseKryo}} whitelist.
> This will allow advanced users to incrementally default to Kryo for certain
> types (e.g. Scrooge ThriftStructs).
> This feature is useful for "data platform" teams who provide
> Spark-as-a-service to internal customers: with this proposed configuration,
> platform teams can configure global defaults for serialization in a way which
> is more incremental / narrow-in-scope than simply defaulting to Kryo
> everywhere.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]