Based on my, limited, read through of the code that uses this, it seems
like often a new KryoSerializerInstance is created for whatever task and
then it falls out of scope, instead of being reused. I did notice that
comment about a pool size of 1, however if the use is generally how I just
described, I don't think that really does anything.

I would be happy to open a PR but, pardon my ignorance, how would I go
about doing that properly? Do I need to open a JIRA issue first? Also how
would I demonstrate performance gains? Do you guys use something like
ScalaMeter?

Thanks for your help!

On Wed, Oct 24, 2018 at 2:37 PM Sean Owen <sro...@gmail.com> wrote:

> I don't know; possibly just because it wasn't available whenever Kryo
> was first used in the project.
>
> Skimming the code, the KryoSerializerInstance looks like a wrapper
> that provides a Kryo object to do work. It already maintains a 'pool'
> of just 1 instance. Is the point that KryoSerializer can share a
> KryoPool across KryoSerializerInstances that provides them with a Kryo
> rather than allocate a new one? makes sense, though I believe the
> concern is always whether that somehow shares state or config in a way
> that breaks something. I see there's already a reset() call in here to
> try to avoid that.
>
> Well, seems worth a PR, especially if you can demonstrate some
> performance gains.
>
> On Wed, Oct 24, 2018 at 3:09 PM Patrick Brown
> <patrick.barry.br...@gmail.com> wrote:
> >
> > Hi,
> >
> > I am wondering about the implementation of KryoSerializer, specifically
> the lack of use of KryoPool, which is recommended by Kryo themselves.
> >
> > Looking at the code, it seems that frequently KryoSerializer.newInstance
> is called, followed by a serialize and then this instance goes out of
> scope, this seems like it causes frequent creation of Kryo instances,
> something which the Kryo documentation says is expensive.
> >
> > By doing flame graphs on our own running software (it processes a lot of
> small jobs) it seems like a good amount of time is spent on this.
> >
> > I have a small patch we are using internally which implements a reused
> KryoPool inside KryoSerializer (not KryoSerializerInstance) in order to
> avoid the creation of many Kryo instances. I am wonder if I am missing
> something as to why this isn't done already. If not I am wondering if this
> might be a patch that Spark would be interested in merging in, and how I
> might go about that.
> >
> > Thanks,
> >
> > Patrick
>

Reply via email to