Hi,

I am wondering about the implementation of KryoSerializer, specifically the
lack of use of KryoPool, which is recommended by Kryo themselves.

Looking at the code, it seems that frequently KryoSerializer.newInstance is
called, followed by a serialize and then this instance goes out of scope,
this seems like it causes frequent creation of Kryo instances, something
which the Kryo documentation says is expensive.

By doing flame graphs on our own running software (it processes a lot of
small jobs) it seems like a good amount of time is spent on this.

I have a small patch we are using internally which implements a reused
KryoPool inside KryoSerializer (not KryoSerializerInstance) in order to
avoid the creation of many Kryo instances. I am wonder if I am missing
something as to why this isn't done already. If not I am wondering if this
might be a patch that Spark would be interested in merging in, and how I
might go about that.

Thanks,

Patrick

Reply via email to