Hi Prahabh,

This question has been asked before :)

Few years ago (late 2016),  I made a presentation on running Hive Queries
on the Spark execution engine for Hortonworks.

https://www.slideshare.net/MichTalebzadeh1/query-engines-for-hive-mr-spark-tez-with-llap-considerations

The issue you will face will be compatibility problems with versions of
Hive and Spark.

My suggestion would be to use Spark as a massive parallel processing and
Hive as a storage layer. However, you need to test what can be migrated or
not.

HTH


Mich


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 1 Jul 2021 at 10:52, Pralabh Kumar <pralabhku...@gmail.com> wrote:

> Hi Dev
>
> I am having thousands of legacy hive queries .  As a plan to move to Spark
> , we are planning to migrate Hive queries on Spark .  Now there are two
> approaches
>
>
>    1.  One is Hive on Spark , which is similar to changing the execution
>    engine in hive queries like TEZ.
>    2. Another one is migrating hive queries to Hivecontext/sparksql , an
>    approach used by Facebook and presented in Spark conference.
>    
> https://databricks.com/session/experiences-migrating-hive-workload-to-sparksql#:~:text=Spark%20SQL%20in%20Apache%20Spark,SQL%20with%20minimal%20user%20intervention
>    .
>
>
> Can you please guide me which option to go for . I am personally inclined
> to go for option 2 . It also allows the use of the latest spark .
>
> Please help me on the same , as there are not much comparisons online
> available keeping Spark 3.0 in perspective.
>
> Regards
> Pralabh Kumar
>
>
>

Reply via email to