Re: Spark hive overwrite is very very slow

KhajaAsmath Mohammed Sun, 20 Aug 2017 06:53:04 -0700

Yes we tried hive and want to migrate to spark for better performance. I am 
using paraquet tables . Still no better performance while loading.


Sent from my iPhone

> On Aug 20, 2017, at 2:24 AM, Jörn Franke <jornfra...@gmail.com> wrote:
> 
> Have you tried directly in Hive how the performance is? 
> 
> In which Format do you expect Hive to write? Have you made sure it is in this 
> format? It could be that you use an inefficient format (e.g. CSV + bzip2).
> 
>> On 20. Aug 2017, at 03:18, KhajaAsmath Mohammed <mdkhajaasm...@gmail.com> 
>> wrote:
>> 
>> Hi,
>> 
>> I have written spark sql job on spark2.0 by using scala . It is just pulling 
>> the data from hive table and add extra columns , remove duplicates and then 
>> write it back to hive again.
>> 
>> In spark ui, it is taking almost 40 minutes to write 400 go of data. Is 
>> there anything that I need to improve performance .
>> 
>> Spark.sql.partitions is 2000 in my case with executor memory of 16gb and 
>> dynamic allocation enabled.
>> 
>> I am doing insert overwrite on partition by
>> Da.write.mode(overwrite).insertinto(table)
>> 
>> Any suggestions please ??
>> 
>> Sent from my iPhone
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>> 

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Spark hive overwrite is very very slow

Reply via email to