[ https://issues.apache.org/jira/browse/SPARK-20923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan resolved SPARK-20923. --------------------------------- Resolution: Fixed Fix Version/s: 2.3.0 > TaskMetrics._updatedBlockStatuses uses a lot of memory > ------------------------------------------------------ > > Key: SPARK-20923 > URL: https://issues.apache.org/jira/browse/SPARK-20923 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.1.0 > Reporter: Thomas Graves > Assignee: Thomas Graves > Labels: releasenotes > Fix For: 2.3.0 > > > The driver appears to use a ton of memory in certain cases to store the task > metrics updated block status'. For instance I had a user reading data form > hive and caching it. The # of tasks to read was around 62,000, they were > using 1000 executors and it ended up caching a couple TB's of data. The > driver kept running out of memory. > I investigated and it looks like there was 5GB of a 10GB heap being used up > by the TaskMetrics._updatedBlockStatuses because there are a lot of blocks. > The updatedBlockStatuses was already removed from the task end event under > SPARK-20084. I don't see anything else that seems to be using this. Anybody > know if I missed something? > If its not being used we should remove it, otherwise we need to figure out a > better way of doing it so it doesn't use so much memory. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org