Sergei Lebedev created SPARK-22805:
--------------------------------------
Summary: Use aliases for StorageLevel in event logs
Key: SPARK-22805
URL: https://issues.apache.org/jira/browse/SPARK-22805
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 2.2.1, 2.1.2
Reporter: Sergei Lebedev
Priority: Minor
Fact 1: {{StorageLevel}} has a private constructor, therefore a list of
predefined levels is not extendable (by the users).
Fact 2: The format of event logs uses redundant representation for storage
levels
{code}
>>> len('{"Use Disk": true, "Use Memory": false, "Deserialized": true,
>>> "Replication": 1}')
79
>>> len('DISK_ONLY')
9
{code}
Fact 3: This leads to excessive log sizes for workloads with lots of
partitions, because every partition would have the storage level field which is
60-70 bytes more than it should be.
Suggested quick win: use the names of the predefined levels to identify them in
the event log.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]