Dear team,


About this topic, there are some previous discussions in PR[1]. It's better to 
visit it carefully before chiming in, thanks.


Current State:
Lamber-Ken: +1
Udit Mehrotra: +1
Bhavani Sudha: -1
Vinoth Chandar: -1


Thanks,
Lamber-Ken



At 2020-02-06 06:10:52, "lamberken" <[email protected]> wrote:
>
>
>Dear team,
>
>
>With the 0.5.1 version released, user need to add 
>`org.apache.spark:spark-avro_2.11:2.4.4` when starting hudi command, like 
>bellow
>/-------------------------------------------------------------------------------------------------------------------------------------------------------------/
>spark-2.4.4-bin-hadoop2.7/bin/spark-shell \
>  --packages 
> org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating,org.apache.spark:spark-avro_2.11:2.4.4
>  \
>  --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
>/-------------------------------------------------------------------------------------------------------------------------------------------------------------/
>
>
>From spark-avro-guide[1], we know that the spark-avro module is external, it 
>is not exists in spark-2.4.4-bin-hadoop2.7.tgz.
>So may it's better to relocate spark-avro dependency by using 
>maven-shade-plugin. If so, user will starting hudi like 0.5.0 version does.
>/-------------------------------------------------------------------------------------------------------------------------------------------------------------/
>spark-2.4.4-bin-hadoop2.7/bin/spark-shell \
>  --packages org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating \
>  --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
>/-------------------------------------------------------------------------------------------------------------------------------------------------------------/
>
>
>I created a pr to fix this[3], we may need have more discussion about this, 
>any suggestion is welcome, thanks very much :)
>Current state:
>@bhasudha : +1
>@vinoth       : -1
>
>
>[1] http://spark.apache.org/docs/latest/sql-data-sources-avro.html
>[2] 
>http://mirror.bit.edu.cn/apache/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
> 
>[3] https://github.com/apache/incubator-hudi/pull/1290
>

Reply via email to