I too second that for existing users we should keep the same behavior. But would like to get some clarity on what's the path towards unbundling spark-avro. Or are we always going to have only bundled (hudi spark bundle with spark-avro) artifacts in maven and for unbundled version, we are going to ask devs to build hudi by their own, I don't think many would go that route ever and will stick to the officially released artifacts. So, if we have plans to eventually deprecate/stop bundling spark-avro, may be we need to think through this.
On Tue, 8 Mar 2022 at 19:20, Y Ethan Guo <ethan.guoyi...@gmail.com> wrote: > Thanks for raising the discussion. I agree that from the usability > standpoint from the user side, we should keep the same expectation > regarding "--packages" for Spark and reliance bundled spark-avro for > utilities bundle in this release. > > Given that there are Spark API changes between 3.2.0 and 3.2.1, do we also > add Spark profiles for patch versions besides the latest, e.g. 3.2.0, as > well? If a user has Spark 3.2.0 in their environment, they have to upgrade > both Hudi and Spark if they want to upgrade Hudi release. Do we know if > this is a major use case? > > Best, > - Ethan > > On Tue, Mar 8, 2022 at 6:15 PM Vinoth Chandar <vin...@apache.org> wrote: > > > Thanks Alexey. > > > > This was actually the case for a while now, I think. From what I can see, > > our quickstart for spark still suggests passing spark-avro in via > > --packages, but utilities bundle related examples are relying on the fact > > that this is pre-bundled. > > > > I do acknowledge that with recent Spark 3.x versions, breakages have > become > > much more frequent, amplifying this pain. However, to prevent jobs from > > failing upon upgrade (i.e forcing everyone to redeploy streaming + batch > > job with the --packages flag), I would prefer if we actually kept the > same > > bundling behavior with the following simplifications. > > > > 1. We have three spark profiles now - spark2, spark3.1.x, and spark3 > > (3.2.1). We continue to bundle spark-avro and support the latest spark > > minor version > > 2. We retain and make the docs clearer about how users can "optionally" > > unbundle and deploy for other versions. > > > > Given other large features going out, turned on by default this release, > > not sure if its a good idea to introduce a breaking change like this. > > > > Thanks > > Vinoth > > > > On Tue, Mar 8, 2022 at 1:32 PM Alexey Kudinkin <ale...@onehouse.ai> > wrote: > > > > > Hello, everyone! > > > > > > While working on HUDI-3549 < > > > https://issues.apache.org/jira/browse/HUDI-3549>, > > > we've surprisingly discovered that Hudi actually bundles "spark-avro" > > > dependency *by default*. > > > > > > This is problematic b/c "spark-avro" is tightly coupled with some of > the > > > other Spark components making up its core distribution (ie being > packaged > > > in Spark itself, not an external packages, one example of that is > > > "spark-sql") > > > > > > In regards to HUDI-3549 > > > <https://issues.apache.org/jira/browse/HUDI-3549> itself, > > > the problem in there unfolded like following: > > > > > > 1. We've built "hudi-spark-bundle" which got "spark-avro" 3.2.1 > > bundled > > > along with it > > > 2. @Sivabalan tried to use this Hudi bundle w/ Spark 3.2.0 > > > 3. It failed b/c "spark-avro" 3.2.1 is *not compatible *w/ > "spark-sql" > > > 3.2.0 (b/c of https://github.com/apache/spark/pull/34978, fixing > typo > > > and renaming Internal API methods DataSourceUtils) > > > > > > > > > To avoid this problems going forward, our proposal is to > > > > > > 1. *Unbundle* "spark-avro" from Hudi bundles by default (practically > > > this means that Hudi users would need to now specify spark-avro via > > > `--packages` flag, since it's not part of Spark's core distribution) > > > 2. (Optional) If community still sees value in bundling (and > shading) > > > "spark-avro" in some cases, we can add Maven profile that would > allow > > > to do > > > that *ad hoc*. > > > > > > We've put a PR#4955 <https://github.com/apache/hudi/pull/4955> with > the > > > proposed changes. > > > > > > Looking forward to your feedback. > > > > > > -- Regards, -Sivabalan