Hi Prahabh, This question has been asked before :)
Few years ago (late 2016), I made a presentation on running Hive Queries on the Spark execution engine for Hortonworks. https://www.slideshare.net/MichTalebzadeh1/query-engines-for-hive-mr-spark-tez-with-llap-considerations The issue you will face will be compatibility problems with versions of Hive and Spark. My suggestion would be to use Spark as a massive parallel processing and Hive as a storage layer. However, you need to test what can be migrated or not. HTH Mich view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Thu, 1 Jul 2021 at 10:52, Pralabh Kumar <pralabhku...@gmail.com> wrote: > Hi Dev > > I am having thousands of legacy hive queries . As a plan to move to Spark > , we are planning to migrate Hive queries on Spark . Now there are two > approaches > > > 1. One is Hive on Spark , which is similar to changing the execution > engine in hive queries like TEZ. > 2. Another one is migrating hive queries to Hivecontext/sparksql , an > approach used by Facebook and presented in Spark conference. > > https://databricks.com/session/experiences-migrating-hive-workload-to-sparksql#:~:text=Spark%20SQL%20in%20Apache%20Spark,SQL%20with%20minimal%20user%20intervention > . > > > Can you please guide me which option to go for . I am personally inclined > to go for option 2 . It also allows the use of the latest spark . > > Please help me on the same , as there are not much comparisons online > available keeping Spark 3.0 in perspective. > > Regards > Pralabh Kumar > > >