Re:[DISCUSS] Relocate spark-avro dependency by maven-shade-plugin

hmatu Wed, 05 Feb 2020 22:54:42 -0800

Thanks for raising this! +1 to @Udit Mehrotra's point.


 It's right that recommend users to actually build their  own hudi jars, with 
the spark version they use. It avoid the compatibility issues 

between user's local jars and pre-built hudi spark version(2.4.4).

Or can remove "org.apache.spark:spark-avro_2.11:2.4.4"? Because user local env 
will contains that external dependency if they use avro.

If not, to run hudi(release-0.5.1) is more complex for me, when using Delta 
Lake, it's more simpler:
just "bin/spark-shell --packages io.delta:delta-core_2.11:0.5.0"







------------------&nbsp;Original&nbsp;------------------
From:&nbsp;"lamberken"<[email protected]&gt;;
Date:&nbsp;Thu, Feb 6, 2020 07:42 AM
To:&nbsp;"dev"<[email protected]&gt;;

Subject:&nbsp;Re:[DISCUSS] Relocate spark-avro dependency by maven-shade-plugin





Dear team,


About this topic, there are some previous discussions in PR[1]. It's better to 
visit it carefully before chiming in, thanks.


Current State:
Lamber-Ken: +1
Udit Mehrotra: +1
Bhavani Sudha: -1
Vinoth Chandar: -1


Thanks,
Lamber-Ken



At 2020-02-06 06:10:52, "lamberken" <[email protected]&gt; wrote:
&gt;
&gt;
&gt;Dear team,
&gt;
&gt;
&gt;With the 0.5.1 version released, user need to add 
`org.apache.spark:spark-avro_2.11:2.4.4` when starting hudi command, like bellow
&gt;/-------------------------------------------------------------------------------------------------------------------------------------------------------------/
&gt;spark-2.4.4-bin-hadoop2.7/bin/spark-shell \
&gt;&nbsp; --packages 
org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating,org.apache.spark:spark-avro_2.11:2.4.4
 \
&gt;&nbsp; --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
&gt;/-------------------------------------------------------------------------------------------------------------------------------------------------------------/
&gt;
&gt;
&gt;From spark-avro-guide[1], we know that the spark-avro module is external, 
it is not exists in spark-2.4.4-bin-hadoop2.7.tgz.
&gt;So may it's better to relocate spark-avro dependency by using 
maven-shade-plugin. If so, user will starting hudi like 0.5.0 version does.
&gt;/-------------------------------------------------------------------------------------------------------------------------------------------------------------/
&gt;spark-2.4.4-bin-hadoop2.7/bin/spark-shell \
&gt;&nbsp; --packages org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating \
&gt;&nbsp; --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
&gt;/-------------------------------------------------------------------------------------------------------------------------------------------------------------/
&gt;
&gt;
&gt;I created a pr to fix this[3], we may need have more discussion about this, 
any suggestion is welcome, thanks very much :)
&gt;Current state:
&gt;@bhasudha : +1
&gt;@vinoth&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : -1
&gt;
&gt;
&gt;[1] http://spark.apache.org/docs/latest/sql-data-sources-avro.html
&gt;[2] 
http://mirror.bit.edu.cn/apache/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz 
&gt;[3] https://github.com/apache/incubator-hudi/pull/1290
&gt;

Re:[DISCUSS] Relocate spark-avro dependency by maven-shade-plugin

Reply via email to