I'm not saying that this issue should be a blocker for 2.4.1, rather I'm looking for help moving things along. I'm not a committer in any of the Spark, Parquet, or Avro projects.
> On Mar 10, 2019, at 8:53 PM, Sean Owen <sro...@gmail.com> wrote: > > From https://issues.apache.org/jira/browse/SPARK-25588, I'm reading that: > > - this is a Parquet-Avro version conflict thing > - a downstream app wants different versions of Parquet and Avro than > Spark uses, which triggers it Prior to 2.4.0, Spark depended on versions of Parquet and Avro that did not work with each other. In fact, a different version of Avro had to be used in Spark's test scope to prevent runtime errors. As a workaround, we had to override Parquet to 1.8.2 (later 1.8.3) but pin parquet-avro to 1.8.1. > - it doesn't work in 2.4.0 In 2.4.0 we're no longer able to pin parquet-avro to 1.8.1, so our workaround is broken. Using the Spark 2.4.0 versions of Parquet and Avro uncovers this new error. > > It's not a regression from 2.4.0, which is the immediate question. > There isn't even a Parquet fix available. I believe https://github.com/apache/parquet-mr/pull/560/files <https://github.com/apache/parquet-mr/pull/560/files> is a fix, but I haven't made all the necessary snapshot builds to test. I was waiting for a valid Spark 2.4.1 RC to try. > But I'm not even seeing why this is excuse-making? > > On Sun, Mar 10, 2019 at 8:44 PM Mark Hamstra <m...@clearstorydata.com> wrote: >> >> Now wait... we created a regression in 2.4.0. Arguably, we should have >> blocked that release until we had a fix; but the issue came up late in the >> release process and it looks to me like there wasn't an adequate fix >> immediately available, so we did something bad and released 2.4.0 with a >> known regression. Saying that there is now no regression from 2.4 is >> tautological and no excuse for not taking in a fix -- and it looks like that >> fix has been waiting for months.