I don't know; possibly just because it wasn't available whenever Kryo was first used in the project.
Skimming the code, the KryoSerializerInstance looks like a wrapper that provides a Kryo object to do work. It already maintains a 'pool' of just 1 instance. Is the point that KryoSerializer can share a KryoPool across KryoSerializerInstances that provides them with a Kryo rather than allocate a new one? makes sense, though I believe the concern is always whether that somehow shares state or config in a way that breaks something. I see there's already a reset() call in here to try to avoid that. Well, seems worth a PR, especially if you can demonstrate some performance gains. On Wed, Oct 24, 2018 at 3:09 PM Patrick Brown <patrick.barry.br...@gmail.com> wrote: > > Hi, > > I am wondering about the implementation of KryoSerializer, specifically the > lack of use of KryoPool, which is recommended by Kryo themselves. > > Looking at the code, it seems that frequently KryoSerializer.newInstance is > called, followed by a serialize and then this instance goes out of scope, > this seems like it causes frequent creation of Kryo instances, something > which the Kryo documentation says is expensive. > > By doing flame graphs on our own running software (it processes a lot of > small jobs) it seems like a good amount of time is spent on this. > > I have a small patch we are using internally which implements a reused > KryoPool inside KryoSerializer (not KryoSerializerInstance) in order to avoid > the creation of many Kryo instances. I am wonder if I am missing something as > to why this isn't done already. If not I am wondering if this might be a > patch that Spark would be interested in merging in, and how I might go about > that. > > Thanks, > > Patrick --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org