One more small fix (on another topic) for the next RC:
https://github.com/apache/spark/pull/50685

Thanks!
Szehon

On Tue, Apr 22, 2025 at 10:07 AM Rozov, Vlad <vro...@amazon.com.invalid>
wrote:

> Correct, to me it looks like a Spark bug
> https://issues.apache.org/jira/browse/SPARK-51821 that may be hard to
> trigger and is reproduce using the test case provided in
> https://github.com/apache/spark/pull/50594:
>
> 1. Spark UninterruptibleThread “task” is interrupted by “test” thread
> while “task” thread is blocked in NIO operation.
> 2. NIO operation is interruptible (channel  is InterruptibleChannel). In
> case of Parquet, it is WritableByteChannel.
> 3. As part of handling InterruptedException, channel interrupts the “task”
> thread (
> https://github.com/apache/hadoop/blob/5770647dc73d552819963ba33f50be518058ee03/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java#L1029
> )
>
> Thank you,
>
> Vlad
>
>
> On Apr 22, 2025, at 1:53 AM, Wenchen Fan <cloud0...@gmail.com> wrote:
>
> Correct me if I'm wrong: this is a long-standing Spark bug that is very
> hard to trigger, but the new Parquet version happens to hit the trigger
> condition and exposes the bug. If this is the case, I'm +1 to fix the Spark
> bug instead of downgrading the Parquet version.
>
> Let's move the technical discussions to
> https://github.com/apache/spark/pull/50594.
>
> On Tue, Apr 22, 2025 at 11:20 AM Manu Zhang <owenzhang1...@gmail.com>
> wrote:
>
>> I don't think PARQUET-2432 has any issue itself. It looks to have
>> triggered a deadlock case like https://github.com/apache/spark/pull/50594.
>>
>> I'd suggest that we fix forward if possible.
>>
>> Thanks,
>> Manu
>>
>> On Mon, Apr 21, 2025 at 11:19 PM Rozov, Vlad <vro...@amazon.com.invalid>
>> wrote:
>>
>>> The deadlock is reproducible without Parquet. Please see
>>> https://github.com/apache/spark/pull/50594.
>>>
>>> Thank you,
>>>
>>> Vlad
>>>
>>> On Apr 21, 2025, at 1:59 AM, Cheng Pan <pan3...@gmail.com> wrote:
>>>
>>> The deadlock is introduced by PARQUET-2432(1.14.0), if we decide
>>> downgrade, the latest workable version is Parquet 1.13.1.
>>>
>>> Thanks,
>>> Cheng Pan
>>>
>>>
>>>
>>> On Apr 21, 2025, at 16:53, Wenchen Fan <cloud0...@gmail.com> wrote:
>>>
>>> +1 to downgrade to Parquet 1.15.0 for Spark 4.0. According to
>>> https://github.com/apache/spark/pull/50583#issuecomment-2815243571 ,
>>> the Parquet CVE does not affect Spark.
>>>
>>> On Mon, Apr 21, 2025 at 2:45 PM Hyukjin Kwon <gurwls...@apache.org>
>>> wrote:
>>>
>>>> That's nice but we need to wait for them to release, and upgrade right?
>>>> Let's revert the parquet upgrade out of 4.0 branch since we're not directly
>>>> affected by the CVE anyway.
>>>>
>>>> On Mon, 21 Apr 2025 at 15:42, Yuming Wang <yumw...@apache.org> wrote:
>>>>
>>>>> It seems this patch(https://github.com/apache/parquet-java/pull/3196)
>>>>> can avoid deadlock issue if using Parquet 1.15.1.
>>>>>
>>>>> On Wed, Apr 16, 2025 at 5:39 PM Niranjan Jayakar
>>>>> <n...@databricks.com.invalid> wrote:
>>>>>
>>>>>> I found another bug introduced in 4.0 that breaks Spark connect
>>>>>> client x server compatibility:
>>>>>> https://github.com/apache/spark/pull/50604.
>>>>>>
>>>>>> Once merged, this should be included in the next RC.
>>>>>>
>>>>>> On Thu, Apr 10, 2025 at 5:21 PM Wenchen Fan <cloud0...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Please vote on releasing the following candidate as Apache Spark
>>>>>>> version 4.0.0.
>>>>>>>
>>>>>>> The vote is open until April 15 (PST) and passes if a majority +1
>>>>>>> PMC votes are cast, with a minimum of 3 +1 votes.
>>>>>>>
>>>>>>> [ ] +1 Release this package as Apache Spark 4.0.0
>>>>>>> [ ] -1 Do not release this package because ...
>>>>>>>
>>>>>>> To learn more about Apache Spark, please see
>>>>>>> https://spark.apache.org/
>>>>>>>
>>>>>>> The tag to be voted on is v4.0.0-rc4 (commit
>>>>>>> e0801d9d8e33cd8835f3e3beed99a3588c16b776)
>>>>>>> https://github.com/apache/spark/tree/v4.0.0-rc4
>>>>>>>
>>>>>>> The release files, including signatures, digests, etc. can be found
>>>>>>> at:
>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v4.0.0-rc4-bin/
>>>>>>>
>>>>>>> Signatures used for Spark RCs can be found in this file:
>>>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>>>>>
>>>>>>> The staging repository for this release can be found at:
>>>>>>>
>>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1480/
>>>>>>>
>>>>>>> The documentation corresponding to this release can be found at:
>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v4.0.0-rc4-docs/
>>>>>>>
>>>>>>> The list of bug fixes going into 4.0.0 can be found at the following
>>>>>>> URL:
>>>>>>> https://issues.apache.org/jira/projects/SPARK/versions/12353359
>>>>>>>
>>>>>>> This release is using the release script of the tag v4.0.0-rc4.
>>>>>>>
>>>>>>> FAQ
>>>>>>>
>>>>>>> =========================
>>>>>>> How can I help test this release?
>>>>>>> =========================
>>>>>>>
>>>>>>> If you are a Spark user, you can help us test this release by taking
>>>>>>> an existing Spark workload and running on this release candidate,
>>>>>>> then
>>>>>>> reporting any regressions.
>>>>>>>
>>>>>>> If you're working in PySpark you can set up a virtual env and install
>>>>>>> the current RC and see if anything important breaks, in the
>>>>>>> Java/Scala
>>>>>>> you can add the staging repository to your projects resolvers and
>>>>>>> test
>>>>>>> with the RC (make sure to clean up the artifact cache before/after so
>>>>>>> you don't end up building with a out of date RC going forward).
>>>>>>>
>>>>>>
>>>
>>>
>

Reply via email to