Re:Re: [DISCUSS] Relocate spark-avro dependency by maven-shade-plugin

lamberken Wed, 05 Feb 2020 15:25:41 -0800


Hi @bhasudha,



No need to say sorry, I think this discussion is meaningful hudi project.


Thanks,
Lamber-Ken











At 2020-02-06 07:07:49, "Bhavani Sudha" <[email protected]> wrote:
>Hi @lamberken Sorry I missed to see this earlier. I also left this comment
>in the PR. I think Vinoth brings up a valid point. Although your PR intends
>to make it easier for users to not care about scala 2.11 or scala 2.12, we
>also need to avoid coupling Hudi with specific spark_avro versions be it
>2.4.4 or 3.0-preview2.
>
>Please consider my vote as -1.
>
>Thanks,
>Sudha
>
>On Wed, Feb 5, 2020 at 2:11 PM lamberken <[email protected]> wrote:
>
>>
>>
>> Dear team,
>>
>>
>> With the 0.5.1 version released, user need to add
>> `org.apache.spark:spark-avro_2.11:2.4.4` when starting hudi command, like
>> bellow
>>
>> /-------------------------------------------------------------------------------------------------------------------------------------------------------------/
>> spark-2.4.4-bin-hadoop2.7/bin/spark-shell \
>>   --packages
>> org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating,org.apache.spark:spark-avro_2.11:2.4.4
>> \
>>   --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
>>
>> /-------------------------------------------------------------------------------------------------------------------------------------------------------------/
>>
>>
>> From spark-avro-guide[1], we know that the spark-avro module is external,
>> it is not exists in spark-2.4.4-bin-hadoop2.7.tgz.
>> So may it's better to relocate spark-avro dependency by using
>> maven-shade-plugin. If so, user will starting hudi like 0.5.0 version does.
>>
>> /-------------------------------------------------------------------------------------------------------------------------------------------------------------/
>> spark-2.4.4-bin-hadoop2.7/bin/spark-shell \
>>   --packages org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating \
>>   --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
>>
>> /-------------------------------------------------------------------------------------------------------------------------------------------------------------/
>>
>>
>> I created a pr to fix this[3], we may need have more discussion about
>> this, any suggestion is welcome, thanks very much :)
>> Current state:
>> @bhasudha : +1
>> @vinoth       : -1
>>
>>
>> [1] http://spark.apache.org/docs/latest/sql-data-sources-avro.html
>> [2]
>> http://mirror.bit.edu.cn/apache/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
>> [3] https://github.com/apache/incubator-hudi/pull/1290
>>
>>

Re:Re: [DISCUSS] Relocate spark-avro dependency by maven-shade-plugin

Reply via email to