Tathagata Das created SPARK-5523:
------------------------------------
Summary: TaskMetrics and TaskInfo have innumerable copies of the
hostname string
Key: SPARK-5523
URL: https://issues.apache.org/jira/browse/SPARK-5523
Project: Spark
Issue Type: Bug
Reporter: Tathagata Das
TaskMetrics and TaskInfo objects have the hostname associated with the task.
As these are created (directly or through deserialization of RPC messages),
each of them have a separate String object for the hostname even though most of
them have the same string data in them. This results in thousands of string
objects, increasing memory requirement of the driver.
This can be easily deduped when deserializing a TaskMetrics object, or when
creating a TaskInfo object (in TaskSchedulerImpl).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]