we are using parquet tables, is it causing any performance issue? On Sun, Aug 20, 2017 at 9:09 AM, Jörn Franke <jornfra...@gmail.com> wrote:
> Improving the performance of Hive can be also done by switching to > Tez+llap as an engine. > Aside from this : you need to check what is the default format that it > writes to Hive. One issue for the slow storing into a hive table could be > that it writes by default to csv/gzip or csv/bzip2 > > > On 20. Aug 2017, at 15:52, KhajaAsmath Mohammed <mdkhajaasm...@gmail.com> > wrote: > > > > Yes we tried hive and want to migrate to spark for better performance. I > am using paraquet tables . Still no better performance while loading. > > > > Sent from my iPhone > > > >> On Aug 20, 2017, at 2:24 AM, Jörn Franke <jornfra...@gmail.com> wrote: > >> > >> Have you tried directly in Hive how the performance is? > >> > >> In which Format do you expect Hive to write? Have you made sure it is > in this format? It could be that you use an inefficient format (e.g. CSV + > bzip2). > >> > >>> On 20. Aug 2017, at 03:18, KhajaAsmath Mohammed < > mdkhajaasm...@gmail.com> wrote: > >>> > >>> Hi, > >>> > >>> I have written spark sql job on spark2.0 by using scala . It is just > pulling the data from hive table and add extra columns , remove duplicates > and then write it back to hive again. > >>> > >>> In spark ui, it is taking almost 40 minutes to write 400 go of data. > Is there anything that I need to improve performance . > >>> > >>> Spark.sql.partitions is 2000 in my case with executor memory of 16gb > and dynamic allocation enabled. > >>> > >>> I am doing insert overwrite on partition by > >>> Da.write.mode(overwrite).insertinto(table) > >>> > >>> Any suggestions please ?? > >>> > >>> Sent from my iPhone > >>> --------------------------------------------------------------------- > >>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >>> >