Correct me if I'm wrong: this is a long-standing Spark bug that is very
hard to trigger, but the new Parquet version happens to hit the trigger
condition and exposes the bug. If this is the case, I'm +1 to fix the Spark
bug instead of downgrading the Parquet version.

Let's move the technical discussions to
https://github.com/apache/spark/pull/50594.

On Tue, Apr 22, 2025 at 11:20 AM Manu Zhang <owenzhang1...@gmail.com> wrote:

> I don't think PARQUET-2432 has any issue itself. It looks to have
> triggered a deadlock case like https://github.com/apache/spark/pull/50594.
> I'd suggest that we fix forward if possible.
>
> Thanks,
> Manu
>
> On Mon, Apr 21, 2025 at 11:19 PM Rozov, Vlad <vro...@amazon.com.invalid>
> wrote:
>
>> The deadlock is reproducible without Parquet. Please see
>> https://github.com/apache/spark/pull/50594.
>>
>> Thank you,
>>
>> Vlad
>>
>> On Apr 21, 2025, at 1:59 AM, Cheng Pan <pan3...@gmail.com> wrote:
>>
>> The deadlock is introduced by PARQUET-2432(1.14.0), if we decide
>> downgrade, the latest workable version is Parquet 1.13.1.
>>
>> Thanks,
>> Cheng Pan
>>
>>
>>
>> On Apr 21, 2025, at 16:53, Wenchen Fan <cloud0...@gmail.com> wrote:
>>
>> +1 to downgrade to Parquet 1.15.0 for Spark 4.0. According to
>> https://github.com/apache/spark/pull/50583#issuecomment-2815243571 , the
>> Parquet CVE does not affect Spark.
>>
>> On Mon, Apr 21, 2025 at 2:45 PM Hyukjin Kwon <gurwls...@apache.org>
>> wrote:
>>
>>> That's nice but we need to wait for them to release, and upgrade right?
>>> Let's revert the parquet upgrade out of 4.0 branch since we're not directly
>>> affected by the CVE anyway.
>>>
>>> On Mon, 21 Apr 2025 at 15:42, Yuming Wang <yumw...@apache.org> wrote:
>>>
>>>> It seems this patch(https://github.com/apache/parquet-java/pull/3196)
>>>> can avoid deadlock issue if using Parquet 1.15.1.
>>>>
>>>> On Wed, Apr 16, 2025 at 5:39 PM Niranjan Jayakar
>>>> <n...@databricks.com.invalid> wrote:
>>>>
>>>>> I found another bug introduced in 4.0 that breaks Spark connect client
>>>>> x server compatibility: https://github.com/apache/spark/pull/50604.
>>>>>
>>>>> Once merged, this should be included in the next RC.
>>>>>
>>>>> On Thu, Apr 10, 2025 at 5:21 PM Wenchen Fan <cloud0...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Please vote on releasing the following candidate as Apache Spark
>>>>>> version 4.0.0.
>>>>>>
>>>>>> The vote is open until April 15 (PST) and passes if a majority +1 PMC
>>>>>> votes are cast, with a minimum of 3 +1 votes.
>>>>>>
>>>>>> [ ] +1 Release this package as Apache Spark 4.0.0
>>>>>> [ ] -1 Do not release this package because ...
>>>>>>
>>>>>> To learn more about Apache Spark, please see
>>>>>> https://spark.apache.org/
>>>>>>
>>>>>> The tag to be voted on is v4.0.0-rc4 (commit
>>>>>> e0801d9d8e33cd8835f3e3beed99a3588c16b776)
>>>>>> https://github.com/apache/spark/tree/v4.0.0-rc4
>>>>>>
>>>>>> The release files, including signatures, digests, etc. can be found
>>>>>> at:
>>>>>> https://dist.apache.org/repos/dist/dev/spark/v4.0.0-rc4-bin/
>>>>>>
>>>>>> Signatures used for Spark RCs can be found in this file:
>>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>>>>
>>>>>> The staging repository for this release can be found at:
>>>>>>
>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1480/
>>>>>>
>>>>>> The documentation corresponding to this release can be found at:
>>>>>> https://dist.apache.org/repos/dist/dev/spark/v4.0.0-rc4-docs/
>>>>>>
>>>>>> The list of bug fixes going into 4.0.0 can be found at the following
>>>>>> URL:
>>>>>> https://issues.apache.org/jira/projects/SPARK/versions/12353359
>>>>>>
>>>>>> This release is using the release script of the tag v4.0.0-rc4.
>>>>>>
>>>>>> FAQ
>>>>>>
>>>>>> =========================
>>>>>> How can I help test this release?
>>>>>> =========================
>>>>>>
>>>>>> If you are a Spark user, you can help us test this release by taking
>>>>>> an existing Spark workload and running on this release candidate, then
>>>>>> reporting any regressions.
>>>>>>
>>>>>> If you're working in PySpark you can set up a virtual env and install
>>>>>> the current RC and see if anything important breaks, in the Java/Scala
>>>>>> you can add the staging repository to your projects resolvers and test
>>>>>> with the RC (make sure to clean up the artifact cache before/after so
>>>>>> you don't end up building with a out of date RC going forward).
>>>>>>
>>>>>
>>
>>

Reply via email to