Have you tried directly in Hive how the performance is? 

In which Format do you expect Hive to write? Have you made sure it is in this 
format? It could be that you use an inefficient format (e.g. CSV + bzip2).

> On 20. Aug 2017, at 03:18, KhajaAsmath Mohammed <mdkhajaasm...@gmail.com> 
> wrote:
> 
> Hi,
> 
> I have written spark sql job on spark2.0 by using scala . It is just pulling 
> the data from hive table and add extra columns , remove duplicates and then 
> write it back to hive again.
> 
> In spark ui, it is taking almost 40 minutes to write 400 go of data. Is there 
> anything that I need to improve performance .
> 
> Spark.sql.partitions is 2000 in my case with executor memory of 16gb and 
> dynamic allocation enabled.
> 
> I am doing insert overwrite on partition by
> Da.write.mode(overwrite).insertinto(table)
> 
> Any suggestions please ??
> 
> Sent from my iPhone
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to