we are using parquet tables, is it causing any performance issue?

On Sun, Aug 20, 2017 at 9:09 AM, Jörn Franke <jornfra...@gmail.com> wrote:

> Improving the performance of Hive can be also done by switching to
> Tez+llap as an engine.
> Aside from this : you need to check what is the default format that it
> writes to Hive. One issue for the slow storing into a hive table could be
> that it writes by default to csv/gzip or csv/bzip2
>
> > On 20. Aug 2017, at 15:52, KhajaAsmath Mohammed <mdkhajaasm...@gmail.com>
> wrote:
> >
> > Yes we tried hive and want to migrate to spark for better performance. I
> am using paraquet tables . Still no better performance while loading.
> >
> > Sent from my iPhone
> >
> >> On Aug 20, 2017, at 2:24 AM, Jörn Franke <jornfra...@gmail.com> wrote:
> >>
> >> Have you tried directly in Hive how the performance is?
> >>
> >> In which Format do you expect Hive to write? Have you made sure it is
> in this format? It could be that you use an inefficient format (e.g. CSV +
> bzip2).
> >>
> >>> On 20. Aug 2017, at 03:18, KhajaAsmath Mohammed <
> mdkhajaasm...@gmail.com> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I have written spark sql job on spark2.0 by using scala . It is just
> pulling the data from hive table and add extra columns , remove duplicates
> and then write it back to hive again.
> >>>
> >>> In spark ui, it is taking almost 40 minutes to write 400 go of data.
> Is there anything that I need to improve performance .
> >>>
> >>> Spark.sql.partitions is 2000 in my case with executor memory of 16gb
> and dynamic allocation enabled.
> >>>
> >>> I am doing insert overwrite on partition by
> >>> Da.write.mode(overwrite).insertinto(table)
> >>>
> >>> Any suggestions please ??
> >>>
> >>> Sent from my iPhone
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >>>
>

Reply via email to