[
https://issues.apache.org/jira/browse/SPARK-17930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15577506#comment-15577506
]
Guoqiang Li commented on SPARK-17930:
-------------------------------------
If a stage contains a lot of tasks, eg one million tasks, the code here needs
to create one million SerializerInstance instances, which seriously affects the
performance of the DAG. At least we can reuse the SerializerInstance instance
per stage.
> The SerializerInstance instance used when deserializing a TaskResult is not
> reused
> -----------------------------------------------------------------------------------
>
> Key: SPARK-17930
> URL: https://issues.apache.org/jira/browse/SPARK-17930
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 1.6.1, 2.0.1
> Reporter: Guoqiang Li
>
> The following code is called when the DirectTaskResult instance is
> deserialized
> {noformat}
> def value(): T = {
> if (valueObjectDeserialized) {
> valueObject
> } else {
> // Each deserialization creates a new instance of SerializerInstance,
> which is very time-consuming
> val resultSer = SparkEnv.get.serializer.newInstance()
> valueObject = resultSer.deserialize(valueBytes)
> valueObjectDeserialized = true
> valueObject
> }
> }
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]