koert kuipers created SPARK-1811:
------------------------------------
Summary: Support resizable output buffer for kryo serializer
Key: SPARK-1811
URL: https://issues.apache.org/jira/browse/SPARK-1811
Project: Spark
Issue Type: New Feature
Components: Spark Core
Affects Versions: 1.0.0
Reporter: koert kuipers
Priority: Minor
Currently the size of kryo serializer output buffer can be set with
spark.kryoserializer.buffer.mb
The issue with this setting is that it has to be one-size-fits-all, so it ends
up being the maximum size needed, even if only a single task out of many needs
it to be that big. A resizable buffer will allow most tasks to use a modest
sized buffer while the incidental task that needs a really big buffer can get
it at a cost (allocating a new buffer and copying the contents over repeatedly
as the buffer grows... with each new allocation the size doubles).
The class used for the buffer is kryo Output, which supports resizing if
maxCapacity is set bigger than capacity. I suggest we provide a setting
spark.kryoserializer.buffer.max.mb which defaults to
spark.kryoserializer.buffer.mb, and which sets Output's maxCapacity.
Pull request for this jira:
https://github.com/apache/spark/pull/735
--
This message was sent by Atlassian JIRA
(v6.2#6252)