After my talk on T-Digests in Spark at Spark Summit East, there were some requests for a UDAF-based interface for working with Datasets. I'm pleased to announce that I released a library for doing T-Digest sketching with UDAFs:
https://github.com/isarn/isarn-sketches-spark This initial release provides support for Scala. Future releases will support PySpark bindings, and additional tools for leveraging T-Digests in ML pipelines. Cheers! Erik