Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

Xiao Li Fri, 01 Nov 2019 09:47:25 -0700

Hi, Steve,

Thanks for your comments! My major quality concern is not against Hadoop
3.2. In this release, Hive execution module upgrade [from 1.2 to 2.3], Hive
thrift-server upgrade, and JDK11 supports are added to Hadoop 3.2 profile
only. Compared with Hadoop 2.x profile, the Hadoop 3.2 profile is more
risky due to these changes.


To speed up the adoption of Spark 3.0, which has many other highly
desirable features, I am proposing to keep Hadoop 2.x profile as the
default.

Cheers,

Xiao.



On Fri, Nov 1, 2019 at 5:33 AM Steve Loughran <[email protected]> wrote:

> What is the current default value? as the 2.x releases are becoming EOL;
> 2.7 is dead, there might be a 2.8.x; for now 2.9 is the branch-2 release
> getting attention. 2.10.0 shipped yesterday, but the ".0" means there will
> inevitably be surprises.
>
> One issue about using a older versions is that any problem reported
> -especially at stack traces you can blame me for- Will generally be met by
> a response of "does it go away when you upgrade?" The other issue is how
> much test coverage are things getting?
>
> w.r.t Hadoop 3.2 stability, nothing major has been reported. The ABFS
> client is there, and I the big guava update (HADOOP-16213) went in. People
> will either love or hate that.
>
> No major changes in s3a code between 3.2.0 and 3.2.1; I have a large
> backport planned though, including changes to better handle AWS caching of
> 404s generatd from HEAD requests before an object was actually created.
>
> It would be really good if the spark distributions shipped with later
> versions of the hadoop artifacts.
>
> On Mon, Oct 28, 2019 at 7:53 PM Xiao Li <[email protected]> wrote:
>
>> The stability and quality of Hadoop 3.2 profile are unknown. The changes
>> are massive, including Hive execution and a new version of Hive
>> thriftserver.
>>
>> To reduce the risk, I would like to keep the current default version
>> unchanged. When it becomes stable, we can change the default profile to
>> Hadoop-3.2.
>>
>> Cheers,
>>
>> Xiao
>>
>> On Mon, Oct 28, 2019 at 12:51 PM Sean Owen <[email protected]> wrote:
>>
>>> I'm OK with that, but don't have a strong opinion nor info about the
>>> implications.
>>> That said my guess is we're close to the point where we don't need to
>>> support Hadoop 2.x anyway, so, yeah.
>>>
>>> On Mon, Oct 28, 2019 at 2:33 PM Dongjoon Hyun <[email protected]>
>>> wrote:
>>> >
>>> > Hi, All.
>>> >
>>> > There was a discussion on publishing artifacts built with Hadoop 3 .
>>> > But, we are still publishing with Hadoop 2.7.3 and `3.0-preview` will
>>> be the same because we didn't change anything yet.
>>> >
>>> > Technically, we need to change two places for publishing.
>>> >
>>> > 1. Jenkins Snapshot Publishing
>>> >
>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/
>>> >
>>> > 2. Release Snapshot/Release Publishing
>>> >
>>> https://github.com/apache/spark/blob/master/dev/create-release/release-build.sh
>>> >
>>> > To minimize the change, we need to switch our default Hadoop profile.
>>> >
>>> > Currently, the default is `hadoop-2.7 (2.7.4)` profile and `hadoop-3.2
>>> (3.2.0)` is optional.
>>> > We had better use `hadoop-3.2` profile by default and `hadoop-2.7`
>>> optionally.
>>> >
>>> > Note that this means we use Hive 2.3.6 by default. Only `hadoop-2.7`
>>> distribution will use `Hive 1.2.1` like Apache Spark 2.4.x.
>>> >
>>> > Bests,
>>> > Dongjoon.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: [email protected]
>>>
>>>
>>
>> --
>> [image: Databricks Summit - Watch the talks]
>> <https://databricks.com/sparkaisummit/north-america>
>>
>

-- 
[image: Databricks Summit - Watch the talks]
<https://databricks.com/sparkaisummit/north-america>

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

Reply via email to