Hey all, I have been out of the game for a while but I am getting active again. My opinion is 'out-with the old, in the with the new'. I often work in regulated environments. The first thing they look at is the OSS issues and the problem is nightmare level, see all this red vulnerability stuff: https://mvnrepository.com/artifact/org.apache.hive/hive-exec?p=2
I am building hadoop on Alpine. hadoop common even in the latest hadoop 3.4.2 still wants to use protobuf *2.5.0. *You can't even find a version of alpine that has that protobuf.!!!!! Thats how old it is. That dependency downstream then forces everything below it to also have the problem. Hive will need to include a protobuf-lib 6 years old to keep up with hadoop that' s 8 years old. Protobuf 2.5.0 is this old https://groups.google.com/g/protobuf/c/CZngjTrQqdI?pli=1 "This is how Spark does it, which is also the main reason why users are more likely to adopt Spark as a SQL engine" Spark somehow still depends on hive 2.8: https://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.13/3.5.7 But if you read between the lines that is going away, HMS is end of life, unity catalog goes forward. https://community.databricks.com/t5/data-engineering/hive-metastore-end-of-life/td-p/136152 No disrespect to any of the platform maintainers. I understand their work and its value. "HDP" "CDH" "big top" and all the distros. They shouldn't be hives target. It is resulting in this insane backporting to support protobuf from 2013 to run it on rocky linux. No one builds software like this anymore. Starrocks https://www.starrocks.io/ is running on ubuntu latest, Back in the day hive built off master, there were no CDHs or HDP, then came the "shim layer" which was clever but now it is a crutch. it is making everyone target into the past. Literally targeting a protobuf version from 2013. All the duckdb people brag on linked in that they can query json. Hive is so unhip I have to jam it into the conversation :) If you see someone talking about a metastore it is* nessy, polaris, unity catalog. or GLUE!* See this: https://issues.apache.org/jira/browse/HADOOP-19756 A fortress of if statements to make it run on sun and glibc redhat 4 :). We maintain this basura we just continue the push into irrelevance. It's really time to stop targeting the "distos" they are fading fast. Make it build on master, make it build on alpine. https://github.com/edwardcapriolo/edgy-ansible/blob/main/imaging/hadoop/compositions/ha_rm_zk/compose.yml Hip, docker, features, winning! not "compatibility with rh4 runnon on mainframe, stable releases from vendor" Edward On Wed, Oct 9, 2024 at 8:02 AM lisoda <[email protected]> wrote: > HI TEAM. > > I would like to discuss with everyone the issue of running Hive4 in Hadoop > environments below version 3.3.6. Currently, a large number of Hive users > are still using low-version environments such as Hadoop 2.6/2.7/3.1.1. To > be honest, upgrading Hadoop is a challenging task. We cannot force users to > upgrade their Hadoop cluster versions just to use Hive4. In order to > encourage these potential users to adopt and use Hive4, we need to provide > a general solution that allows Hive4 to run on low-version Hadoop (at least > we need to address the compatibility issues with Hadoop version 3.1.0). > The general plan is as follows: In both the Hive and Tez projects, in > addition to providing the existing tar packages, we should also provide tar > packages that include high-version Hadoop dependencies. By defining > configuration files, users can avoid using any jar package dependencies > from the Hadoop cluster. In this way, users can initiate Tez tasks on > low-version Hadoop clusters using only the built-in Hadoop dependencies. > This is how Spark does it, which is also the main reason why users are > more likely to adopt Spark as a SQL engine. Spark not only provides tar > packages without Hadoop dependencies but also provides tar packages with > built-in Hadoop 3 and Hadoop 2. Users can upgrade to a new version of Spark > without upgrading the Hadoop version. > We have implemented such a plan in our production environment, and we have > successfully run Hive4.0.0 and Hive4.0.1 in the HDP 3.1.0 environment. They > are currently working well. > Based on our successful experience, I believe it is necessary for us to > provide tar packages with all Hadoop dependencies built in. At the very > least, we should document that users can successfully run Hive4 on > low-version Hadoop in this way. > However, my idea may not be mature enough, so I would like to know what > others think. It would be great if someone could participate in this topic > and discuss it. > > > TKS. > LISODA. > >
