I don't know; possibly just because it wasn't available whenever Kryo
was first used in the project.

Skimming the code, the KryoSerializerInstance looks like a wrapper
that provides a Kryo object to do work. It already maintains a 'pool'
of just 1 instance. Is the point that KryoSerializer can share a
KryoPool across KryoSerializerInstances that provides them with a Kryo
rather than allocate a new one? makes sense, though I believe the
concern is always whether that somehow shares state or config in a way
that breaks something. I see there's already a reset() call in here to
try to avoid that.

Well, seems worth a PR, especially if you can demonstrate some
performance gains.

On Wed, Oct 24, 2018 at 3:09 PM Patrick Brown
<patrick.barry.br...@gmail.com> wrote:
>
> Hi,
>
> I am wondering about the implementation of KryoSerializer, specifically the 
> lack of use of KryoPool, which is recommended by Kryo themselves.
>
> Looking at the code, it seems that frequently KryoSerializer.newInstance is 
> called, followed by a serialize and then this instance goes out of scope, 
> this seems like it causes frequent creation of Kryo instances, something 
> which the Kryo documentation says is expensive.
>
> By doing flame graphs on our own running software (it processes a lot of 
> small jobs) it seems like a good amount of time is spent on this.
>
> I have a small patch we are using internally which implements a reused 
> KryoPool inside KryoSerializer (not KryoSerializerInstance) in order to avoid 
> the creation of many Kryo instances. I am wonder if I am missing something as 
> to why this isn't done already. If not I am wondering if this might be a 
> patch that Spark would be interested in merging in, and how I might go about 
> that.
>
> Thanks,
>
> Patrick

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to