[jira] [Resolved] (SPARK-40662) Serialization of MapStatuses is somtimes much larger on scala 2.13

Emil Ejbyfeldt (Jira) Thu, 06 Oct 2022 23:16:05 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-40662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Emil Ejbyfeldt resolved SPARK-40662.
------------------------------------
    Resolution: Invalid

The increase was caused by change in hashCode between 2.12 and 2.13 and when 
reading using a different scala version caused there to be a lot more non empty 
blocks to fetch.

> Serialization of MapStatuses is somtimes much larger on scala 2.13
> ------------------------------------------------------------------
>
>                 Key: SPARK-40662
>                 URL: https://issues.apache.org/jira/browse/SPARK-40662
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.3.0
>            Reporter: Emil Ejbyfeldt
>            Priority: Major
>
> We have observed a case where the same job run against spark on scala 2.13 
> fails going out of memory due to the the broadcast for the MapStatuses being 
> huge.
> In the logs around the time the job fails it tries to create a broadcast of 
> size 4.8GiB. 
> ```
> 2022-09-18 22:46:01,418 INFO memory.MemoryStore: Block broadcast_17 stored as 
> values in memory (estimated size 4.8 GiB, free 12.9 GiB)
> ```
> The same broadcast of the MapStatus for the same job running on 2.12 is 391.5 
> Mib so 
> ```
> 2022-09-18 16:11:58,753 INFO memory.MemoryStore: Block broadcast_17 stored as 
> values in memory (estimated size 391.5 MiB, free 26.4 GiB)
> ```
> in this particular case it seems the broadcast for MapStatuses more than 10 
> large when using 2.13. This is not something universal for all MapStatus 
> broadcast as we have have many other jobs using Scala 2.13 where the status 
> is ruffly the same size. 
> This has been observed on 3.3.0 but I also tested it against 3.3.1-rc2 and 
> build of 3.4.0-SNAPSHOT and both of those also reproduced the issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (SPARK-40662) Serialization of MapStatuses is somtimes much larger on scala 2.13

Reply via email to