+1 (non-binding) On Wed, Jul 20, 2016 at 2:48 PM Michael Allman <mich...@videoamp.com> wrote:
> I've run some tests with some real and some synthetic parquet data with > nested columns with and without the hive metastore on our Spark 1.5, 1.6 > and 2.0 versions. I haven't seen any unexpected performance surprises, > except that Spark 2.0 now does schema inference across all files in a > partitioned parquet metastore table. Granted, you aren't using a metastore > table, but maybe Spark does that for partitioned non-metastore tables as > well. > > Michael > > > On Jul 20, 2016, at 2:16 PM, Maciej Bryński <mac...@brynski.pl> wrote: > > > > @Michael, > > I answered in Jira and could repeat here. > > I think that my problem is unrelated to Hive, because I'm using > read.parquet method. > > I also attached some VisualVM snapshots to SPARK-16321 (I think I should > merge both issues) > > And code profiling suggest bottleneck when reading parquet file. > > > > I wonder if there are any other benchmarks related to parquet > performance. > > > > Regards, > > -- > > Maciek Bryński > > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >