Hello, everyone! While working on HUDI-3549 <https://issues.apache.org/jira/browse/HUDI-3549>, we've surprisingly discovered that Hudi actually bundles "spark-avro" dependency *by default*.
This is problematic b/c "spark-avro" is tightly coupled with some of the other Spark components making up its core distribution (ie being packaged in Spark itself, not an external packages, one example of that is "spark-sql") In regards to HUDI-3549 <https://issues.apache.org/jira/browse/HUDI-3549> itself, the problem in there unfolded like following: 1. We've built "hudi-spark-bundle" which got "spark-avro" 3.2.1 bundled along with it 2. @Sivabalan tried to use this Hudi bundle w/ Spark 3.2.0 3. It failed b/c "spark-avro" 3.2.1 is *not compatible *w/ "spark-sql" 3.2.0 (b/c of https://github.com/apache/spark/pull/34978, fixing typo and renaming Internal API methods DataSourceUtils) To avoid this problems going forward, our proposal is to 1. *Unbundle* "spark-avro" from Hudi bundles by default (practically this means that Hudi users would need to now specify spark-avro via `--packages` flag, since it's not part of Spark's core distribution) 2. (Optional) If community still sees value in bundling (and shading) "spark-avro" in some cases, we can add Maven profile that would allow to do that *ad hoc*. We've put a PR#4955 <https://github.com/apache/hudi/pull/4955> with the proposed changes. Looking forward to your feedback.