Re: [DISCUSS] Relocate spark-avro dependency by maven-shade-plugin

Bhavani Sudha Wed, 05 Feb 2020 15:08:41 -0800

Hi @lamberken Sorry I missed to see this earlier. I also left this comment
in the PR. I think Vinoth brings up a valid point. Although your PR intends
to make it easier for users to not care about scala 2.11 or scala 2.12, we
also need to avoid coupling Hudi with specific spark_avro versions be it
2.4.4 or 3.0-preview2.


Please consider my vote as -1.

Thanks,
Sudha

On Wed, Feb 5, 2020 at 2:11 PM lamberken <[email protected]> wrote:

>
>
> Dear team,
>
>
> With the 0.5.1 version released, user need to add
> `org.apache.spark:spark-avro_2.11:2.4.4` when starting hudi command, like
> bellow
>
> /-------------------------------------------------------------------------------------------------------------------------------------------------------------/
> spark-2.4.4-bin-hadoop2.7/bin/spark-shell \
>   --packages
> org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating,org.apache.spark:spark-avro_2.11:2.4.4
> \
>   --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
>
> /-------------------------------------------------------------------------------------------------------------------------------------------------------------/
>
>
> From spark-avro-guide[1], we know that the spark-avro module is external,
> it is not exists in spark-2.4.4-bin-hadoop2.7.tgz.
> So may it's better to relocate spark-avro dependency by using
> maven-shade-plugin. If so, user will starting hudi like 0.5.0 version does.
>
> /-------------------------------------------------------------------------------------------------------------------------------------------------------------/
> spark-2.4.4-bin-hadoop2.7/bin/spark-shell \
>   --packages org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating \
>   --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
>
> /-------------------------------------------------------------------------------------------------------------------------------------------------------------/
>
>
> I created a pr to fix this[3], we may need have more discussion about
> this, any suggestion is welcome, thanks very much :)
> Current state:
> @bhasudha : +1
> @vinoth       : -1
>
>
> [1] http://spark.apache.org/docs/latest/sql-data-sources-avro.html
> [2]
> http://mirror.bit.edu.cn/apache/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
> [3] https://github.com/apache/incubator-hudi/pull/1290
>
>

Re: [DISCUSS] Relocate spark-avro dependency by maven-shade-plugin

Reply via email to