Hello Edward
Yeah, your point is absolutely valid—my intention was never to drag along an 
ancient tech stack either.
In fact, the compatibility work is already done; we no longer need to compile 
against crusty old dependencies. What we’re doing is simple: just avoid pulling 
in extra env libs at runtime. We bundle every runtime dependency into self-lib, 
which contains virtually zero legacy packages (Example: tez with full hadoop 
3.4.1 lib,and run on hadoop 3.1.0 yarn,no use any hadoop 3.1.0 lib.if something 
weird does surface, the user can debug it themselves).
As long as self-lib can call out to external Hadoop/OSS environments via 
function calls, the compatibility box is ticked. The effort is tiny and we 
basically wash our hands of historical debt—why not do it? (Of course, if this 
approach still fails for a particular user, we drop it and move on; people have 
to look forward.)


As for the REST Catalog, its original intent is sound, yet I can’t shake a 
strong sense of déjà vu. Is it really wise to discard the long-honed power of 
the metadata system itself and expose everything through a generic protocol 
that will always be treated as a second-class citizen? The world works like 
this: there may be a universal law or framework, but the closer you get to 
reality, the more compromises and distortions pile up. So my view of 
RestCatalog remains that of a generic, second-tier protocol.(I wouldn’t rule 
out the possibility that one day we’ll find ourselves returning to a 
purpose-built metadata system.)
Still, no matter what, jettisoning historical debt and marching forward 
together is unquestionably the right call.


Lisoda.





At 2025-12-24 06:23:27, "Edward Capriolo" <[email protected]> wrote:

Hey all,
I have been out of the game for a while but I am getting active again. My 
opinion is 'out-with the old, in the with the new'. I often work in regulated 
environments. The first thing they look at is the OSS issues and the problem is 
nightmare level, see all this red vulnerability stuff: 
https://mvnrepository.com/artifact/org.apache.hive/hive-exec?p=2 


I am building hadoop on Alpine. hadoop common even in the latest hadoop 3.4.2 
still wants to use protobuf 2.5.0. You can't even find a version of alpine that 
has that protobuf.!!!!! Thats how old it is. That dependency downstream then 
forces everything below it to also have the problem. Hive will need to include 
a protobuf-lib 6 years old to keep up with hadoop that' s 8 years old. Protobuf 
2.5.0 is this old 
https://groups.google.com/g/protobuf/c/CZngjTrQqdI?pli=1

"This is how Spark does it, which is also the main reason why users are more 
likely to adopt Spark as a SQL engine"


Spark somehow still depends on hive 2.8: 
https://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.13/3.5.7

But if you read between the lines that is going away, HMS is end of life, unity 
catalog goes forward.  
https://community.databricks.com/t5/data-engineering/hive-metastore-end-of-life/td-p/136152


No disrespect to any of the platform maintainers. I understand their work and 
its value. "HDP" "CDH" "big top" and all the distros. They shouldn't be hives 
target. It is resulting in this insane backporting to support protobuf from 
2013 to run it on rocky linux. No one builds software like this anymore. 
Starrocks https://www.starrocks.io/ is running on ubuntu latest, 


Back in the day hive built off master, there were no CDHs or HDP, then came the 
"shim layer" which was clever but now it is a crutch. it is making everyone 
target into the past. Literally targeting a protobuf version from 2013. 


All the duckdb people brag on linked in that they can query json. Hive is so 
unhip I have to jam it into the conversation :) If you see someone talking 
about a metastore it is nessy, polaris, unity catalog. or GLUE!






See this:
https://issues.apache.org/jira/browse/HADOOP-19756


A fortress of if statements to make it run on sun and glibc redhat 4 :). We 
maintain this basura we just continue the push into irrelevance. 

It's really time to stop targeting the "distos" they are fading fast. Make it 
build on master, make it build on alpine. 

https://github.com/edwardcapriolo/edgy-ansible/blob/main/imaging/hadoop/compositions/ha_rm_zk/compose.yml


Hip, docker, features, winning! not "compatibility with rh4 runnon on 
mainframe, stable releases from vendor"


Edward 
















On Wed, Oct 9, 2024 at 8:02 AM lisoda <[email protected]> wrote:

HI TEAM.


I would like to discuss with everyone the issue of running Hive4 in Hadoop 
environments below version 3.3.6. Currently, a large number of Hive users are 
still using low-version environments such as Hadoop 2.6/2.7/3.1.1. To be 
honest, upgrading Hadoop is a challenging task. We cannot force users to 
upgrade their Hadoop cluster versions just to use Hive4. In order to encourage 
these potential users to adopt and use Hive4, we need to provide a general 
solution that allows Hive4 to run on low-version Hadoop (at least we need to 
address the compatibility issues with Hadoop version 3.1.0).
The general plan is as follows: In both the Hive and Tez projects, in addition 
to providing the existing tar packages, we should also provide tar packages 
that include high-version Hadoop dependencies. By defining configuration files, 
users can avoid using any jar package dependencies from the Hadoop cluster. In 
this way, users can initiate Tez tasks on low-version Hadoop clusters using 
only the built-in Hadoop dependencies.
This is how Spark does it, which is also the main reason why users are more 
likely to adopt Spark as a SQL engine. Spark not only provides tar packages 
without Hadoop dependencies but also provides tar packages with built-in Hadoop 
3 and Hadoop 2. Users can upgrade to a new version of Spark without upgrading 
the Hadoop version.
We have implemented such a plan in our production environment, and we have 
successfully run Hive4.0.0 and Hive4.0.1 in the HDP 3.1.0 environment. They are 
currently working well.
Based on our successful experience, I believe it is necessary for us to provide 
tar packages with all Hadoop dependencies built in. At the very least, we 
should document that users can successfully run Hive4 on low-version Hadoop in 
this way.
However, my idea may not be mature enough, so I would like to know what others 
think. It would be great if someone could participate in this topic and discuss 
it.




TKS.
LISODA.

Reply via email to