Emil Ejbyfeldt created SPARK-40662:
--------------------------------------
Summary: Serialization of MapStatuses is somtimes much larger on
scala 2.13
Key: SPARK-40662
URL: https://issues.apache.org/jira/browse/SPARK-40662
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 3.3.0
Reporter: Emil Ejbyfeldt
We have observed a case where the same job run against spark on scala 2.13
fails going out of memory due to the the broadcast for the MapStatuses being
huge.
In the logs around the time the job fails it tries to create a broadcast of
size 4.8GiB.
```
2022-09-18 22:46:01,418 INFO memory.MemoryStore: Block broadcast_17 stored as
values in memory (estimated size 4.8 GiB, free 12.9 GiB)
```
The same broadcast of the MapStatus for the same job running on 2.12 is 391.5
Mib so
```
2022-09-18 16:11:58,753 INFO memory.MemoryStore: Block broadcast_17 stored as
values in memory (estimated size 391.5 MiB, free 26.4 GiB)
```
in this particular case it seems the broadcast for MapStatuses more than 10
large when using 2.13. This is not something universal for all MapStatus
broadcast as we have have many other jobs using Scala 2.13 where the status is
ruffly the same size.
This has been observed on 3.3.0 but I also tested it against 3.3.1-rc2 and
build of 3.4.0-SNAPSHOT and both of those also reproduced the issue.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]