I'm not saying that this issue should be a blocker for 2.4.1, rather I'm 
looking for help moving things along.  I'm not a committer in any of the Spark, 
Parquet, or Avro projects.


> On Mar 10, 2019, at 8:53 PM, Sean Owen <sro...@gmail.com> wrote:
> 
> From https://issues.apache.org/jira/browse/SPARK-25588, I'm reading that:
> 
> - this is a Parquet-Avro version conflict thing
> - a downstream app wants different versions of Parquet and Avro than
> Spark uses, which triggers it

Prior to 2.4.0, Spark depended on versions of Parquet and Avro that did not 
work with each other.  In fact, a different version of Avro had to be used in 
Spark's test scope to prevent runtime errors.

As a workaround, we had to override Parquet to 1.8.2 (later 1.8.3) but pin 
parquet-avro to 1.8.1.


> - it doesn't work in 2.4.0

In 2.4.0 we're no longer able to pin parquet-avro to 1.8.1, so our workaround 
is broken.  Using the Spark 2.4.0 versions of Parquet and Avro uncovers this 
new error.


> 
> It's not a regression from 2.4.0, which is the immediate question.
> There isn't even a Parquet fix available.

I believe https://github.com/apache/parquet-mr/pull/560/files 
<https://github.com/apache/parquet-mr/pull/560/files> is a fix, but I haven't 
made all the necessary snapshot builds to test.  I was waiting for a valid 
Spark 2.4.1 RC to try.


> But I'm not even seeing why this is excuse-making?
> 
> On Sun, Mar 10, 2019 at 8:44 PM Mark Hamstra <m...@clearstorydata.com> wrote:
>> 
>> Now wait... we created a regression in 2.4.0. Arguably, we should have 
>> blocked that release until we had a fix; but the issue came up late in the 
>> release process and it looks to me like there wasn't an adequate fix 
>> immediately available, so we did something bad and released 2.4.0 with a 
>> known regression. Saying that there is now no regression from 2.4 is 
>> tautological and no excuse for not taking in a fix -- and it looks like that 
>> fix has been waiting for months.

Reply via email to