Re: Running Hive4 in low-version Hadoop environments.

Edward Capriolo Tue, 23 Dec 2025 14:23:57 -0800

Hey all,
I have been out of the game for a while but I am getting active again. My
opinion is 'out-with the old, in the with the new'. I often work in
regulated environments. The first thing they look at is the OSS issues and
the problem is nightmare level, see all this red vulnerability stuff:
https://mvnrepository.com/artifact/org.apache.hive/hive-exec?p=2

I am building hadoop on Alpine. hadoop common even in the latest hadoop
3.4.2 still wants to use protobuf *2.5.0. *You can't even find a version of
alpine that has that protobuf.!!!!! Thats how old it is. That dependency
downstream then forces everything below it to also have the problem. Hive
will need to include a protobuf-lib 6 years old to keep up with hadoop
that' s 8 years old. Protobuf 2.5.0 is this old
https://groups.google.com/g/protobuf/c/CZngjTrQqdI?pli=1

"This is how Spark does it, which is also the main reason why users are
more likely to adopt Spark as a SQL engine"

Spark somehow still depends on hive 2.8:
https://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.13/3.5.7

But if you read between the lines that is going away, HMS is end of life,
unity catalog goes forward.
https://community.databricks.com/t5/data-engineering/hive-metastore-end-of-life/td-p/136152

No disrespect to any of the platform maintainers. I understand their work
and its value. "HDP" "CDH" "big top" and all the distros. They shouldn't be
hives target. It is resulting in this insane backporting to support
protobuf from 2013 to run it on rocky linux. No one builds software like
this anymore. Starrocks https://www.starrocks.io/ is running on
ubuntu latest,

Back in the day hive built off master, there were no CDHs or HDP, then came
the "shim layer" which was clever but now it is a crutch. it is making
everyone target into the past. Literally targeting a protobuf version from
2013.

All the duckdb people brag on linked in that they can query json. Hive is
so unhip I have to jam it into the conversation :) If you see someone
talking about a metastore it is* nessy, polaris, unity catalog. or GLUE!*

See this:
https://issues.apache.org/jira/browse/HADOOP-19756

A fortress of if statements to make it run on sun and glibc redhat 4 :). We
maintain this basura we just continue the push into irrelevance.

It's really time to stop targeting the "distos" they are fading fast. Make
it build on master, make it build on alpine.

https://github.com/edwardcapriolo/edgy-ansible/blob/main/imaging/hadoop/compositions/ha_rm_zk/compose.yml

Hip, docker, features, winning! not "compatibility with rh4 runnon on
mainframe, stable releases from vendor"

Edward

On Wed, Oct 9, 2024 at 8:02 AM lisoda <[email protected]> wrote:

> HI TEAM.
>
> I would like to discuss with everyone the issue of running Hive4 in Hadoop
> environments below version 3.3.6. Currently, a large number of Hive users
> are still using low-version environments such as Hadoop 2.6/2.7/3.1.1. To
> be honest, upgrading Hadoop is a challenging task. We cannot force users to
> upgrade their Hadoop cluster versions just to use Hive4. In order to
> encourage these potential users to adopt and use Hive4, we need to provide
> a general solution that allows Hive4 to run on low-version Hadoop (at least
> we need to address the compatibility issues with Hadoop version 3.1.0).
> The general plan is as follows: In both the Hive and Tez projects, in
> addition to providing the existing tar packages, we should also provide tar
> packages that include high-version Hadoop dependencies. By defining
> configuration files, users can avoid using any jar package dependencies
> from the Hadoop cluster. In this way, users can initiate Tez tasks on
> low-version Hadoop clusters using only the built-in Hadoop dependencies.
> This is how Spark does it, which is also the main reason why users are
> more likely to adopt Spark as a SQL engine. Spark not only provides tar
> packages without Hadoop dependencies but also provides tar packages with
> built-in Hadoop 3 and Hadoop 2. Users can upgrade to a new version of Spark
> without upgrading the Hadoop version.
> We have implemented such a plan in our production environment, and we have
> successfully run Hive4.0.0 and Hive4.0.1 in the HDP 3.1.0 environment. They
> are currently working well.
> Based on our successful experience, I believe it is necessary for us to
> provide tar packages with all Hadoop dependencies built in. At the very
> least, we should document that users can successfully run Hive4 on
> low-version Hadoop in this way.
> However, my idea may not be mature enough, so I would like to know what
> others think. It would be great if someone could participate in this topic
> and discuss it.
>
>
> TKS.
> LISODA.
>
>

Re: Running Hive4 in low-version Hadoop environments.

Reply via email to