Sergei Lebedev created SPARK-22805:
--------------------------------------

             Summary: Use aliases for StorageLevel in event logs
                 Key: SPARK-22805
                 URL: https://issues.apache.org/jira/browse/SPARK-22805
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 2.2.1, 2.1.2
            Reporter: Sergei Lebedev
            Priority: Minor


Fact 1: {{StorageLevel}} has a private constructor, therefore a list of 
predefined levels is not extendable (by the users).

Fact 2: The format of event logs uses redundant representation for storage 
levels 

{code}
>>> len('{"Use Disk": true, "Use Memory": false, "Deserialized": true, 
>>> "Replication": 1}')
79
>>> len('DISK_ONLY')
9
{code}

Fact 3: This leads to excessive log sizes for workloads with lots of 
partitions, because every partition would have the storage level field which is 
60-70 bytes more than it should be.

Suggested quick win: use the names of the predefined levels to identify them in 
the event log.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to