koert kuipers created SPARK-1811:
------------------------------------

             Summary: Support resizable output buffer for kryo serializer
                 Key: SPARK-1811
                 URL: https://issues.apache.org/jira/browse/SPARK-1811
             Project: Spark
          Issue Type: New Feature
          Components: Spark Core
    Affects Versions: 1.0.0
            Reporter: koert kuipers
            Priority: Minor


Currently the size of kryo serializer output buffer can be set with 
spark.kryoserializer.buffer.mb

The issue with this setting is that it has to be one-size-fits-all, so it ends 
up being the maximum size needed, even if only a single task out of many needs 
it to be that big. A resizable buffer will allow most tasks to use a modest 
sized buffer while the incidental task that needs a really big buffer can get 
it at a cost (allocating a new buffer and copying the contents over repeatedly 
as the buffer grows... with each new allocation the size doubles).

The class used for the buffer is kryo Output, which supports resizing if  
maxCapacity is set bigger than capacity. I suggest we provide a setting 
spark.kryoserializer.buffer.max.mb which defaults to 
spark.kryoserializer.buffer.mb, and which sets Output's maxCapacity.

Pull request for this jira:
https://github.com/apache/spark/pull/735






--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to