[
https://issues.apache.org/jira/browse/SPARK-22805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16295328#comment-16295328
]
Marcelo Vanzin commented on SPARK-22805:
----------------------------------------
Is the question about compatibility? There's none, it's not expected that old
Spark versions should be able to read new logs. Backporting to 2.2 or 2.1 is
sketchier because it means that e.g. 2.2.2 logs wouldn't be readable by the
2.2.1 SHS.
> Use aliases for StorageLevel in event logs
> ------------------------------------------
>
> Key: SPARK-22805
> URL: https://issues.apache.org/jira/browse/SPARK-22805
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 2.1.2, 2.2.1
> Reporter: Sergei Lebedev
> Priority: Minor
>
> Fact 1: {{StorageLevel}} has a private constructor, therefore a list of
> predefined levels is not extendable (by the users).
> Fact 2: The format of event logs uses redundant representation for storage
> levels
> {code}
> >>> len('{"Use Disk": true, "Use Memory": false, "Deserialized": true,
> >>> "Replication": 1}')
> 79
> >>> len('DISK_ONLY')
> 9
> {code}
> Fact 3: This leads to excessive log sizes for workloads with lots of
> partitions, because every partition would have the storage level field which
> is 60-70 bytes more than it should be.
> Suggested quick win: use the names of the predefined levels to identify them
> in the event log.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]