What would be the strategy with hive? Cherry pick patches? Update to more
“modern” versions (like 2.3?)
I know of a few critical schema evolution fixes that we could port to hive
From: Steve Loughran <ste...@hortonworks.com>
Sent: Tuesday, April 3, 2018 1:33 PM
Subject: Re: Hadoop 3 support
To: Apache Spark Dev <firstname.lastname@example.org>
On 3 Apr 2018, at 01:30, Saisai Shao
Yes, the main blocking issue is the hive version used in Spark (1.2.1.spark)
doesn't support run on Hadoop 3. Hive will check the Hadoop version in the
runtime . Besides this I think some pom changes should be enough to support
If we want to use Hadoop 3 shaded client jar, then the pom requires lots of
changes, but this is not necessary.
2018-04-03 4:57 GMT+08:00 Marcelo Vanzin
Saisai filed SPARK-23534, but the main blocking issue is really SPARK-18673.
On Mon, Apr 2, 2018 at 1:00 PM, Reynold Xin
> Does anybody know what needs to be done in order for Spark to support Hadoop
To be ruthless, I'd view Hadoop 3.1 as the first one to play with...3.0.x was
more of a wide-version check. Hadoop 3.1RC0 is out this week, making it the
ideal (last!) time to find showstoppers.
1. I've got a PR which adds a profile to build spark against hadoop 3, with
some fixes for zk import along with better hadoop-cloud profile
Apply that and patch and both mvn and sbt can build with the RC0 from the ASF
build/sbt -Phadoop-3,hadoop-cloud,yarn -Psnapshots-and-staging
2. Everything Marcelo says about hive.
You can build hadoop locally with a -Dhadoop.version=2.11 and the hive
1.2.1.-spark version check goes through. You can't safely bring up HDFS like
that, but you can run spark standalone against things
Short term: build a new hive-1,2.x-spark which fixes up the version check and
merges in those critical patches that cloudera, hortoworks, databricks, +
anyone else has got in for their production systems. I don't think we have that
That leaves a "how to release" story, as the ASF will want it to come out under
the ASF auspices, and, given the liability disclaimers, so should everyone. The
Hive team could be "invited" to publish it as their own if people ask nicely.
-do something about that subclassing to get the thrift endpoint to work. That
can include fixing hive's service to be subclass friendly.
-move to hive 2
That' s a major piece of work.