Rui Li commented on HIVE-14029:
Hmm even with a shim layer, it's difficult to support different Spark versions
if b/c is not maintained between minor releases of Spark.
I'm wondering if the Spark used by Hive can be considered as some kind of
embedded binaries that exclusively used for HoS. On Hive side, we just need to
set spark.home pointing to this Spark. User's other Spark applications, e.g.
SparkSQL, streaming, can still run against the current Spark they have in the
cluster. Will this make it easier for the upgrade?
I think we also need to be more careful to upgrade Spark in the future, if the
upgrade is breaking compatibility. For such upgrade, we need to firstly make
sure there's no obvious regression in functionality and performance.
> Update Spark version to 2.0.0
> Key: HIVE-14029
> URL: https://issues.apache.org/jira/browse/HIVE-14029
> Project: Hive
> Issue Type: Bug
> Reporter: Ferdinand Xu
> Assignee: Ferdinand Xu
> Labels: Incompatible, TODOC2.2
> Fix For: 2.2.0
> Attachments: HIVE-14029.1.patch, HIVE-14029.2.patch,
> HIVE-14029.3.patch, HIVE-14029.4.patch, HIVE-14029.5.patch,
> HIVE-14029.6.patch, HIVE-14029.7.patch, HIVE-14029.8.patch, HIVE-14029.patch
> There are quite some new optimizations in Spark 2.0.0. We need to bump up
> Spark to 2.0.0 to benefit those performance improvements.
> To update Spark version to 2.0.0, the following changes are required:
> * Spark API updates:
> ** SparkShuffler#call return Iterator instead of Iterable
> ** SparkListener -> JavaSparkListener
> ** InputMetrics constructor doesn’t accept readMethod
> ** Method remoteBlocksFetched and localBlocksFetched in ShuffleReadMetrics
> return long type instead of integer
> * Dependency upgrade:
> ** Jackson: 2.4.2 -> 2.6.5
> ** Netty version: 4.0.23.Final -> 4.0.29.Final
> ** Scala binary version: 2.10 -> 2.11
> ** Scala version: 2.10.4 -> 2.11.8
This message was sent by Atlassian JIRA