Re: Dropping Apache Spark Hadoop2 Binary Distribution?

Yang,Jie(INF) Mon, 03 Oct 2022 21:50:00 -0700

Hi, Dongjoon

Our company(Baidu) is still using the combination of Spark 3.3 + Hadoop 2.7.4 
in the production environment. Hadoop 2.7.4 is an internally maintained version 
compiled by Java 8. Although we are using Hadoop 2, I still support this 
proposal because it is positive and exciting.


Regards,
YangJie

发件人: Dongjoon Hyun <[email protected]>
日期: 2022年10月4日 星期二 11:16
收件人: dev <[email protected]>
主题: Dropping Apache Spark Hadoop2 Binary Distribution?

Hi, All.

I'm wondering if the following Apache Spark Hadoop2 Binary Distribution
is still used by someone in the community or not. If it's not used or not 
useful,
we may remove it from Apache Spark 3.4.0 release.

    
https://downloads.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop2.tgz<https://mailshield.baidu.com/check?q=nFKjwur0WPBgNfrarJ1k%2fUbMkNasnbh1TmZiNzBvSuAAb596rlYk182hUiEqyXWjksmdGeptL3s8ghXMv%2buNxwrpF0RZUXK4QQKzVPN3u3Q%3d>

Here is the background of this question.
Since Apache Spark 2.2.0 (SPARK-19493, SPARK-19550), the Apache
Spark community has been building and releasing with Java 8 only.
I believe that the user applications also use Java8+ in these days.
Recently, I received the following message from the Hadoop PMC.

  > "if you really want to claim hadoop 2.x compatibility, then you have to
  > be building against java 7". Otherwise a lot of people with hadoop 2.x
  > clusters won't be able to run your code. If your projects are java8+
  > only, then they are implicitly hadoop 3.1+, no matter what you use
  > in your build. Hence: no need for branch-2 branches except
  > to complicate your build/test/release processes [1]

If Hadoop2 binary distribution is no longer used as of today,
or incomplete somewhere due to Java 8 building, the following three
existing alternative Hadoop 3 binary distributions could be
the better official solution for old Hadoop 2 clusters.

    1) Scala 2.12 and without-hadoop distribution
    2) Scala 2.12 and Hadoop 3 distribution
    3) Scala 2.13 and Hadoop 3 distribution

In short, is there anyone who is using Apache Spark 3.3.0 Hadoop2 Binary 
distribution?

Dongjoon

[1] 
https://issues.apache.org/jira/browse/ORC-1251?focusedCommentId=17608247&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17608247<https://mailshield.baidu.com/check?q=ydfs6JNIgVYX0c7s35hEbDKduTWJZfdqBlri9w1eAUmmi3MLIwhMNIpBPI11b4Ue4yyJduNrNLK%2bO6wv0EJEtYrfL79ZSK18xbM73fm3xOMIk17zxsTfggWFeJdpVDezLVjcWYU0dEW42Y%2bQGV6D7%2fdI48KLX9PGGjGB%2fy8OdRIr%2fu3WQWqH9dNa8Zmn4WvJib9TNaozHE4kzjjZrx8BAJkuUxTlBZOg>

Re: Dropping Apache Spark Hadoop2 Binary Distribution?

Reply via email to