HI TEAM.

I would like to discuss with everyone the issue of running Hive4 in Hadoop 
environments below version 3.3.6. Currently, a large number of Hive users are 
still using low-version environments such as Hadoop 2.6/2.7/3.1.1. To be 
honest, upgrading Hadoop is a challenging task. We cannot force users to 
upgrade their Hadoop cluster versions just to use Hive4. In order to encourage 
these potential users to adopt and use Hive4, we need to provide a general 
solution that allows Hive4 to run on low-version Hadoop (at least we need to 
address the compatibility issues with Hadoop version 3.1.0).
The general plan is as follows: In both the Hive and Tez projects, in addition 
to providing the existing tar packages, we should also provide tar packages 
that include high-version Hadoop dependencies. By defining configuration files, 
users can avoid using any jar package dependencies from the Hadoop cluster. In 
this way, users can initiate Tez tasks on low-version Hadoop clusters using 
only the built-in Hadoop dependencies.
This is how Spark does it, which is also the main reason why users are more 
likely to adopt Spark as a SQL engine. Spark not only provides tar packages 
without Hadoop dependencies but also provides tar packages with built-in Hadoop 
3 and Hadoop 2. Users can upgrade to a new version of Spark without upgrading 
the Hadoop version.
We have implemented such a plan in our production environment, and we have 
successfully run Hive4.0.0 and Hive4.0.1 in the HDP 3.1.0 environment. They are 
currently working well.
Based on our successful experience, I believe it is necessary for us to provide 
tar packages with all Hadoop dependencies built in. At the very least, we 
should document that users can successfully run Hive4 on low-version Hadoop in 
this way.
However, my idea may not be mature enough, so I would like to know what others 
think. It would be great if someone could participate in this topic and discuss 
it.




TKS.
LISODA.

Reply via email to