Hi @lamberken Sorry I missed to see this earlier. I also left this comment in the PR. I think Vinoth brings up a valid point. Although your PR intends to make it easier for users to not care about scala 2.11 or scala 2.12, we also need to avoid coupling Hudi with specific spark_avro versions be it 2.4.4 or 3.0-preview2.
Please consider my vote as -1. Thanks, Sudha On Wed, Feb 5, 2020 at 2:11 PM lamberken <[email protected]> wrote: > > > Dear team, > > > With the 0.5.1 version released, user need to add > `org.apache.spark:spark-avro_2.11:2.4.4` when starting hudi command, like > bellow > > /-------------------------------------------------------------------------------------------------------------------------------------------------------------/ > spark-2.4.4-bin-hadoop2.7/bin/spark-shell \ > --packages > org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating,org.apache.spark:spark-avro_2.11:2.4.4 > \ > --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' > > /-------------------------------------------------------------------------------------------------------------------------------------------------------------/ > > > From spark-avro-guide[1], we know that the spark-avro module is external, > it is not exists in spark-2.4.4-bin-hadoop2.7.tgz. > So may it's better to relocate spark-avro dependency by using > maven-shade-plugin. If so, user will starting hudi like 0.5.0 version does. > > /-------------------------------------------------------------------------------------------------------------------------------------------------------------/ > spark-2.4.4-bin-hadoop2.7/bin/spark-shell \ > --packages org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating \ > --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' > > /-------------------------------------------------------------------------------------------------------------------------------------------------------------/ > > > I created a pr to fix this[3], we may need have more discussion about > this, any suggestion is welcome, thanks very much :) > Current state: > @bhasudha : +1 > @vinoth : -1 > > > [1] http://spark.apache.org/docs/latest/sql-data-sources-avro.html > [2] > http://mirror.bit.edu.cn/apache/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz > [3] https://github.com/apache/incubator-hudi/pull/1290 > >
