I describe some of the details here: https://issues.apache.org/jira/browse/SPARK-27296
The short version of the story is that aggregating data structures (UDTs) used by UDAFs are serialized to a Row object, and de-serialized, for every row in a data frame. Cheers, Erik