Hi, I am wondering about the implementation of KryoSerializer, specifically the lack of use of KryoPool, which is recommended by Kryo themselves.
Looking at the code, it seems that frequently KryoSerializer.newInstance is called, followed by a serialize and then this instance goes out of scope, this seems like it causes frequent creation of Kryo instances, something which the Kryo documentation says is expensive. By doing flame graphs on our own running software (it processes a lot of small jobs) it seems like a good amount of time is spent on this. I have a small patch we are using internally which implements a reused KryoPool inside KryoSerializer (not KryoSerializerInstance) in order to avoid the creation of many Kryo instances. I am wonder if I am missing something as to why this isn't done already. If not I am wondering if this might be a patch that Spark would be interested in merging in, and how I might go about that. Thanks, Patrick