Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

Jonathan Kelly Wed, 20 Jul 2016 15:01:06 -0700

+1 (non-binding)

On Wed, Jul 20, 2016 at 2:48 PM Michael Allman <mich...@videoamp.com> wrote:


> I've run some tests with some real and some synthetic parquet data with
> nested columns with and without the hive metastore on our Spark 1.5, 1.6
> and 2.0 versions. I haven't seen any unexpected performance surprises,
> except that Spark 2.0 now does schema inference across all files in a
> partitioned parquet metastore table. Granted, you aren't using a metastore
> table, but maybe Spark does that for partitioned non-metastore tables as
> well.
>
> Michael
>
> > On Jul 20, 2016, at 2:16 PM, Maciej Bryński <mac...@brynski.pl> wrote:
> >
> > @Michael,
> > I answered in Jira and could repeat here.
> > I think that my problem is unrelated to Hive, because I'm using
> read.parquet method.
> > I also attached some VisualVM snapshots to SPARK-16321 (I think I should
> merge both issues)
> > And code profiling suggest bottleneck when reading parquet file.
> >
> > I wonder if there are any other benchmarks related to parquet
> performance.
> >
> > Regards,
> > --
> > Maciek Bryński
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

Reply via email to