[ https://issues.apache.org/jira/browse/SPARK-22805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16292742#comment-16292742 ]
Sean Owen commented on SPARK-22805: ----------------------------------- The current format is somewhat more flexible but yes it's verbose. How much difference does it make in practice? The problem with changing it is backwards compatibility. > Use aliases for StorageLevel in event logs > ------------------------------------------ > > Key: SPARK-22805 > URL: https://issues.apache.org/jira/browse/SPARK-22805 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 2.1.2, 2.2.1 > Reporter: Sergei Lebedev > Priority: Minor > > Fact 1: {{StorageLevel}} has a private constructor, therefore a list of > predefined levels is not extendable (by the users). > Fact 2: The format of event logs uses redundant representation for storage > levels > {code} > >>> len('{"Use Disk": true, "Use Memory": false, "Deserialized": true, > >>> "Replication": 1}') > 79 > >>> len('DISK_ONLY') > 9 > {code} > Fact 3: This leads to excessive log sizes for workloads with lots of > partitions, because every partition would have the storage level field which > is 60-70 bytes more than it should be. > Suggested quick win: use the names of the predefined levels to identify them > in the event log. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org