Hi all, There's been longstanding demand for statically typed Datasets of Avro. Functionality from the now-deprecated Databricks Spark-Avro project was folded into Spark, but can still only provide DataFrames over Avro data. As is discussed in the PR below, there are still drawbacks from not having fully, statically typed Datasets of Avro.
There's an open PR adding a first-class Encoder for statically typed Datasets of Avro: https://github.com/apache/spark/pull/22878 : https://issues.apache.org/jira/browse/SPARK-25789 (originally in Databricks/spark-avro, https://github.com/databricks/spark-avro/pull/217 : https://github.com/databricks/spark-avro/issues/169) We've tested the content of this PR widely over complex, deeply nested, Avro structures. It seems ready for a last review and nearly ready for merger. Alek Eskilson github : bdrillard