Re: Dropping Apache Spark Hadoop2 Binary Distribution?

Dongjoon Hyun Wed, 05 Oct 2022 16:14:16 -0700

Thank you all.

SPARK-40651 is merged to Apache Spark master branch for Apache Spark 3.4.0
now.


Dongjoon.

On Wed, Oct 5, 2022 at 3:24 PM L. C. Hsieh <[email protected]> wrote:

> +1
>
> Thanks Dongjoon.
>
> On Wed, Oct 5, 2022 at 3:11 PM Jungtaek Lim
> <[email protected]> wrote:
> >
> > +1
> >
> > On Thu, Oct 6, 2022 at 5:59 AM Chao Sun <[email protected]> wrote:
> >>
> >> +1
> >>
> >> > and specifically may allow us to finally move off of the ancient
> version of Guava (?)
> >>
> >> I think the Guava issue comes from Hive 2.3 dependency, not Hadoop.
> >>
> >> On Wed, Oct 5, 2022 at 1:55 PM Xinrong Meng <[email protected]>
> wrote:
> >>>
> >>> +1.
> >>>
> >>> On Wed, Oct 5, 2022 at 1:53 PM Xiao Li <[email protected]>
> wrote:
> >>>>
> >>>> +1.
> >>>>
> >>>> Xiao
> >>>>
> >>>> On Wed, Oct 5, 2022 at 12:49 PM Sean Owen <[email protected]> wrote:
> >>>>>
> >>>>> I'm OK with this. It simplifies maintenance a bit, and specifically
> may allow us to finally move off of the ancient version of Guava (?)
> >>>>>
> >>>>> On Mon, Oct 3, 2022 at 10:16 PM Dongjoon Hyun <
> [email protected]> wrote:
> >>>>>>
> >>>>>> Hi, All.
> >>>>>>
> >>>>>> I'm wondering if the following Apache Spark Hadoop2 Binary
> Distribution
> >>>>>> is still used by someone in the community or not. If it's not used
> or not useful,
> >>>>>> we may remove it from Apache Spark 3.4.0 release.
> >>>>>>
> >>>>>>
> https://downloads.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop2.tgz
> >>>>>>
> >>>>>> Here is the background of this question.
> >>>>>> Since Apache Spark 2.2.0 (SPARK-19493, SPARK-19550), the Apache
> >>>>>> Spark community has been building and releasing with Java 8 only.
> >>>>>> I believe that the user applications also use Java8+ in these days.
> >>>>>> Recently, I received the following message from the Hadoop PMC.
> >>>>>>
> >>>>>>   > "if you really want to claim hadoop 2.x compatibility, then you
> have to
> >>>>>>   > be building against java 7". Otherwise a lot of people with
> hadoop 2.x
> >>>>>>   > clusters won't be able to run your code. If your projects are
> java8+
> >>>>>>   > only, then they are implicitly hadoop 3.1+, no matter what you
> use
> >>>>>>   > in your build. Hence: no need for branch-2 branches except
> >>>>>>   > to complicate your build/test/release processes [1]
> >>>>>>
> >>>>>> If Hadoop2 binary distribution is no longer used as of today,
> >>>>>> or incomplete somewhere due to Java 8 building, the following three
> >>>>>> existing alternative Hadoop 3 binary distributions could be
> >>>>>> the better official solution for old Hadoop 2 clusters.
> >>>>>>
> >>>>>>     1) Scala 2.12 and without-hadoop distribution
> >>>>>>     2) Scala 2.12 and Hadoop 3 distribution
> >>>>>>     3) Scala 2.13 and Hadoop 3 distribution
> >>>>>>
> >>>>>> In short, is there anyone who is using Apache Spark 3.3.0 Hadoop2
> Binary distribution?
> >>>>>>
> >>>>>> Dongjoon
> >>>>>>
> >>>>>> [1]
> https://issues.apache.org/jira/browse/ORC-1251?focusedCommentId=17608247&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17608247
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [email protected]
>
>

Re: Dropping Apache Spark Hadoop2 Binary Distribution?

Reply via email to