Dear team,
About this topic, there are some previous discussions in PR[1]. It's better to visit it carefully before chiming in, thanks. Current State: Lamber-Ken: +1 Udit Mehrotra: +1 Bhavani Sudha: -1 Vinoth Chandar: -1 Thanks, Lamber-Ken At 2020-02-06 06:10:52, "lamberken" <[email protected]> wrote: > > >Dear team, > > >With the 0.5.1 version released, user need to add >`org.apache.spark:spark-avro_2.11:2.4.4` when starting hudi command, like >bellow >/-------------------------------------------------------------------------------------------------------------------------------------------------------------/ >spark-2.4.4-bin-hadoop2.7/bin/spark-shell \ > --packages > org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating,org.apache.spark:spark-avro_2.11:2.4.4 > \ > --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' >/-------------------------------------------------------------------------------------------------------------------------------------------------------------/ > > >From spark-avro-guide[1], we know that the spark-avro module is external, it >is not exists in spark-2.4.4-bin-hadoop2.7.tgz. >So may it's better to relocate spark-avro dependency by using >maven-shade-plugin. If so, user will starting hudi like 0.5.0 version does. >/-------------------------------------------------------------------------------------------------------------------------------------------------------------/ >spark-2.4.4-bin-hadoop2.7/bin/spark-shell \ > --packages org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating \ > --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' >/-------------------------------------------------------------------------------------------------------------------------------------------------------------/ > > >I created a pr to fix this[3], we may need have more discussion about this, >any suggestion is welcome, thanks very much :) >Current state: >@bhasudha : +1 >@vinoth : -1 > > >[1] http://spark.apache.org/docs/latest/sql-data-sources-avro.html >[2] >http://mirror.bit.edu.cn/apache/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz > >[3] https://github.com/apache/incubator-hudi/pull/1290 >
