I submitted a PR for this: https://github.com/apache/spark/pull/25024
On Wed, Mar 27, 2019 at 4:19 PM Erik Erlandson <eerla...@redhat.com> wrote: > I describe some of the details here: > https://issues.apache.org/jira/browse/SPARK-27296 > > The short version of the story is that aggregating data structures (UDTs) > used by UDAFs are serialized to a Row object, and de-serialized, for every > row in a data frame. > Cheers, > Erik > >