On Thu, 18 May 2023 at 03:45, Cheng Pan <cheng...@apache.org> wrote:

> Steve, thanks for the information, I think HADOOP-17046 should be fine for
> the Spark case.
>
> Hadoop put the protobuf 3 into the pre-shaded hadoop-thirdparty, and the
> hadoop-client-runtime shades protobuf 2 during the package, which results
> in protobuf 2 and 3 co-exist in hadoop-client-runtime in different packages:
>
> - protobuf 2: org.apache.hadoop.shaded.com.google.protobuf
> - protobuf 3: org.apache.hadoop.thirdparty.protobuf
>
j
oh, so in fact that "put it back in unshaded" change doesn't do anything
useful through the hadoop-client lib. so it is very much useless.

>
> As HADOOP-18487 plans to mark the protobuf 2 optional, will this make 
> hadoop-client-runtime
> does not ship protobuf 2? If yes, things become worse for downstream
> projects who consumes hadoop shaded client, like Spark, because it requires
> the user to add vanilla protobuf 2 jar into the classpath if they want to
> access those API.
>

Well, what applications are using
org.apache.hadoop.shaded.com.google.protobuf ? hadoop itself doesn't; it's
only referenced in unshaded form because hbase wanted the IPC library to
still work with the unshaded version they were still using. But if the
parquet2 lib is now only available shaded, their protobuf compiled .class
files aren't going to link to it, are they?

does anyone know how spark + hbase + hadoop-client-runtime work so that
spark can talk to an hbase server? especially: what is needed on the
classpath, and what gets loaded for a call

>
> In summary, I think the current state is fine. But for security purposes,
> the Hadoop community may want to remove the EOL protobuf 2 classes from
> hadoop-client-runtime.
>

 +1. the shaded one which is in use also needs upgrading.


> Thanks,
> Cheng Pan
>
>
> On May 17, 2023 at 04:10:43, Dongjoon Hyun <dongj...@apache.org> wrote:
>
>> Thank you for sharing, Steve.
>>
>> Dongjoon
>>
>> On Tue, May 16, 2023 at 11:44 AM Steve Loughran
>> <ste...@cloudera.com.invalid> wrote:
>>
>>> I have some bad news here which is even though hadoop cut protobuf 2.5
>>> support, hbase team put it back in (HADOOP-17046). I don't know if the
>>> shaded hadoop client has removed that dependency on protobuf 2.5.
>>>
>>> In HADOOP-18487 i want to allow hadoop to cut that dependency, with
>>> hbase having to add it to the classpath if they still want it:
>>> https://github.com/apache/hadoop/pull/4996
>>>
>>> It's been neglected -if you can help with review/test etc that'd be
>>> great. I'd love to get this into the 3.3.6 release.
>>>
>>> On Sat, 13 May 2023 at 08:36, Cheng Pan <cheng...@apache.org> wrote:
>>>
>>>> Hi all,
>>>>
>>>> In SPARK-42452 (apache/spark#41153 [1]), I’m trying to remove protobuf
>>>> 2.5.0 from the Spark dependencies.
>>>>
>>>> Spark does not use protobuf 2.5.0 directly, instead, it comes from
>>>> other dependencies, with the following changes, now, Spark does not require
>>>> protobuf 2.5.0.
>>>>
>>>> - SPARK-40323 upgraded ORC 1.8.0, which moved from protobuf 2.5.0 to a
>>>> shaded protobuf 3
>>>>
>>>> - SPARK-33212 switched from Hadoop vanilla client to Hadoop shaded
>>>> client, also removed the protobuf 2 dependency. SPARK-42452 removed the
>>>> support for Hadoop 2.
>>>>
>>>> - SPARK-14421 shaded and relocated protobuf 2.6.1, which is required by
>>>> the kinesis client, into the kinesis assembly jar
>>>>
>>>> - Spark itself's core/connect/protobuf modules use protobuf 3, also
>>>> shaded and relocated all protobuf 3 deps.
>>>>
>>>> Feel free to comment if you still have any concerns.
>>>>
>>>> [1] https://github.com/apache/spark/pull/41153
>>>>
>>>> Thanks,
>>>> Cheng Pan
>>>>
>>>

Reply via email to