Yes we tried hive and want to migrate to spark for better performance. I am using paraquet tables . Still no better performance while loading.
Sent from my iPhone > On Aug 20, 2017, at 2:24 AM, Jörn Franke <jornfra...@gmail.com> wrote: > > Have you tried directly in Hive how the performance is? > > In which Format do you expect Hive to write? Have you made sure it is in this > format? It could be that you use an inefficient format (e.g. CSV + bzip2). > >> On 20. Aug 2017, at 03:18, KhajaAsmath Mohammed <mdkhajaasm...@gmail.com> >> wrote: >> >> Hi, >> >> I have written spark sql job on spark2.0 by using scala . It is just pulling >> the data from hive table and add extra columns , remove duplicates and then >> write it back to hive again. >> >> In spark ui, it is taking almost 40 minutes to write 400 go of data. Is >> there anything that I need to improve performance . >> >> Spark.sql.partitions is 2000 in my case with executor memory of 16gb and >> dynamic allocation enabled. >> >> I am doing insert overwrite on partition by >> Da.write.mode(overwrite).insertinto(table) >> >> Any suggestions please ?? >> >> Sent from my iPhone >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org