Hi @bhasudha,
No need to say sorry, I think this discussion is meaningful hudi project. Thanks, Lamber-Ken At 2020-02-06 07:07:49, "Bhavani Sudha" <[email protected]> wrote: >Hi @lamberken Sorry I missed to see this earlier. I also left this comment >in the PR. I think Vinoth brings up a valid point. Although your PR intends >to make it easier for users to not care about scala 2.11 or scala 2.12, we >also need to avoid coupling Hudi with specific spark_avro versions be it >2.4.4 or 3.0-preview2. > >Please consider my vote as -1. > >Thanks, >Sudha > >On Wed, Feb 5, 2020 at 2:11 PM lamberken <[email protected]> wrote: > >> >> >> Dear team, >> >> >> With the 0.5.1 version released, user need to add >> `org.apache.spark:spark-avro_2.11:2.4.4` when starting hudi command, like >> bellow >> >> /-------------------------------------------------------------------------------------------------------------------------------------------------------------/ >> spark-2.4.4-bin-hadoop2.7/bin/spark-shell \ >> --packages >> org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating,org.apache.spark:spark-avro_2.11:2.4.4 >> \ >> --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' >> >> /-------------------------------------------------------------------------------------------------------------------------------------------------------------/ >> >> >> From spark-avro-guide[1], we know that the spark-avro module is external, >> it is not exists in spark-2.4.4-bin-hadoop2.7.tgz. >> So may it's better to relocate spark-avro dependency by using >> maven-shade-plugin. If so, user will starting hudi like 0.5.0 version does. >> >> /-------------------------------------------------------------------------------------------------------------------------------------------------------------/ >> spark-2.4.4-bin-hadoop2.7/bin/spark-shell \ >> --packages org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating \ >> --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' >> >> /-------------------------------------------------------------------------------------------------------------------------------------------------------------/ >> >> >> I created a pr to fix this[3], we may need have more discussion about >> this, any suggestion is welcome, thanks very much :) >> Current state: >> @bhasudha : +1 >> @vinoth : -1 >> >> >> [1] http://spark.apache.org/docs/latest/sql-data-sources-avro.html >> [2] >> http://mirror.bit.edu.cn/apache/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz >> [3] https://github.com/apache/incubator-hudi/pull/1290 >> >>
