[ 
https://issues.apache.org/jira/browse/SPARK-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-5269:
-----------------------------
    Target Version/s: 1.6.0

> BlockManager.dataDeserialize always creates a new serializer instance
> ---------------------------------------------------------------------
>
>                 Key: SPARK-5269
>                 URL: https://issues.apache.org/jira/browse/SPARK-5269
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>            Reporter: Ivan Vergiliev
>            Assignee: Matt Cheah
>              Labels: performance, serializers
>
> BlockManager.dataDeserialize always creates a new instance of the serializer, 
> which is pretty slow in some cases. I'm using Kryo serialization and have a 
> custom registrator, and its register method is showing up as taking about 15% 
> of the execution time in my profiles. This started happening after I 
> increased the number of keys in a job with a shuffle phase by a factor of 40.
> One solution I can think of is to create a ThreadLocal SerializerInstance for 
> the defaultSerializer, and only create a new one if a custom serializer is 
> passed in. AFAICT a custom serializer is passed only from 
> DiskStore.getValues, and that, on the other hand, depends on the serializer 
> passed to ExternalSorter. I don't know how often this is used, but I think 
> this can still be a good solution for the standard use case.
> Oh, and also - ExternalSorter already has a SerializerInstance, so if the 
> getValues method is called from a single thread, maybe we can pass that 
> directly?
> I'd be happy to try a patch but would probably need a confirmation from 
> someone that this approach would indeed work (or an idea for another).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to