I haven't looked at it in detail...

Somebody's been trying to do that in
https://github.com/apache/spark/pull/20659, but that's kind of a huge
change.

The parts where I'd be concerned are:
- using Hive's original hive-exec package brings in a bunch of shaded
dependencies, which may break Spark in weird ways. HIVE-16391 was
supposed to fix that but nothing has really been done as part of that
bug.
- the hive-exec "core" package avoids the shaded dependencies but used
to have issues of its own. Maybe it's better now, haven't looked.
- what about the current thrift server which is basically a fork of
the Hive 1.2 source code?
- when using Hadoop 3 + an old metastore client that doesn't know
about Hadoop 3, things may break.

The latter one has two possible fixes: say that Hadoop 3 builds of
Spark don't support old metastores; or add code so that Spark loads a
separate copy of Hadoop libraries in that case (search for
"sharesHadoopClasses" in IsolatedClientLoader for where to start with
that).

If trying to update Hive it would be good to avoid having to fork it,
like it's done currently. But not sure that will be possible given the
current hive-exec packaging.

On Mon, Apr 2, 2018 at 2:58 PM, Reynold Xin <r...@databricks.com> wrote:
> Is it difficult to upgrade Hive execution version to the latest version? The
> metastore used to be an issue but now that part had been separated from the
> execution part.
>
>
> On Mon, Apr 2, 2018 at 1:57 PM, Marcelo Vanzin <van...@cloudera.com> wrote:
>>
>> Saisai filed SPARK-23534, but the main blocking issue is really
>> SPARK-18673.
>>
>>
>> On Mon, Apr 2, 2018 at 1:00 PM, Reynold Xin <r...@databricks.com> wrote:
>> > Does anybody know what needs to be done in order for Spark to support
>> > Hadoop
>> > 3?
>> >
>>
>>
>>
>> --
>> Marcelo
>
>



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to