Hey all,
We (the Spark team) have being considering to upgrade parquet-mr in
Spark to 1.8.1 to fix PARQUET-251
<https://issues.apache.org/jira/browse/PARQUET-251>. However, my
micro-benchmark shows that 1.8.1 seems to be suffering a slight
performance regression (5% ~ 10%) compared to 1.7.0 (the version we are
currently using). Not sure whether this is a known issue. Did a quick
search on JIRA using
project = parquet and affectedVersion in ("1.8.0", "1.8.1")
But didn't find any related tickets. What I did in the micro benchmark
was simply reading the whole TPC-DS store_sales table (scale factor 15).
The good news is that 1.8.2-SNAPSHOT looks fine. So directly upgrading
to 1.8.2 seems to be a better idea. Could anybody provide some details
about 1.8.2 release schedule? Thanks in advance!
Cheng