One more small fix (on another topic) for the next RC: https://github.com/apache/spark/pull/50685
Thanks! Szehon On Tue, Apr 22, 2025 at 10:07 AM Rozov, Vlad <vro...@amazon.com.invalid> wrote: > Correct, to me it looks like a Spark bug > https://issues.apache.org/jira/browse/SPARK-51821 that may be hard to > trigger and is reproduce using the test case provided in > https://github.com/apache/spark/pull/50594: > > 1. Spark UninterruptibleThread “task” is interrupted by “test” thread > while “task” thread is blocked in NIO operation. > 2. NIO operation is interruptible (channel is InterruptibleChannel). In > case of Parquet, it is WritableByteChannel. > 3. As part of handling InterruptedException, channel interrupts the “task” > thread ( > https://github.com/apache/hadoop/blob/5770647dc73d552819963ba33f50be518058ee03/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java#L1029 > ) > > Thank you, > > Vlad > > > On Apr 22, 2025, at 1:53 AM, Wenchen Fan <cloud0...@gmail.com> wrote: > > Correct me if I'm wrong: this is a long-standing Spark bug that is very > hard to trigger, but the new Parquet version happens to hit the trigger > condition and exposes the bug. If this is the case, I'm +1 to fix the Spark > bug instead of downgrading the Parquet version. > > Let's move the technical discussions to > https://github.com/apache/spark/pull/50594. > > On Tue, Apr 22, 2025 at 11:20 AM Manu Zhang <owenzhang1...@gmail.com> > wrote: > >> I don't think PARQUET-2432 has any issue itself. It looks to have >> triggered a deadlock case like https://github.com/apache/spark/pull/50594. >> >> I'd suggest that we fix forward if possible. >> >> Thanks, >> Manu >> >> On Mon, Apr 21, 2025 at 11:19 PM Rozov, Vlad <vro...@amazon.com.invalid> >> wrote: >> >>> The deadlock is reproducible without Parquet. Please see >>> https://github.com/apache/spark/pull/50594. >>> >>> Thank you, >>> >>> Vlad >>> >>> On Apr 21, 2025, at 1:59 AM, Cheng Pan <pan3...@gmail.com> wrote: >>> >>> The deadlock is introduced by PARQUET-2432(1.14.0), if we decide >>> downgrade, the latest workable version is Parquet 1.13.1. >>> >>> Thanks, >>> Cheng Pan >>> >>> >>> >>> On Apr 21, 2025, at 16:53, Wenchen Fan <cloud0...@gmail.com> wrote: >>> >>> +1 to downgrade to Parquet 1.15.0 for Spark 4.0. According to >>> https://github.com/apache/spark/pull/50583#issuecomment-2815243571 , >>> the Parquet CVE does not affect Spark. >>> >>> On Mon, Apr 21, 2025 at 2:45 PM Hyukjin Kwon <gurwls...@apache.org> >>> wrote: >>> >>>> That's nice but we need to wait for them to release, and upgrade right? >>>> Let's revert the parquet upgrade out of 4.0 branch since we're not directly >>>> affected by the CVE anyway. >>>> >>>> On Mon, 21 Apr 2025 at 15:42, Yuming Wang <yumw...@apache.org> wrote: >>>> >>>>> It seems this patch(https://github.com/apache/parquet-java/pull/3196) >>>>> can avoid deadlock issue if using Parquet 1.15.1. >>>>> >>>>> On Wed, Apr 16, 2025 at 5:39 PM Niranjan Jayakar >>>>> <n...@databricks.com.invalid> wrote: >>>>> >>>>>> I found another bug introduced in 4.0 that breaks Spark connect >>>>>> client x server compatibility: >>>>>> https://github.com/apache/spark/pull/50604. >>>>>> >>>>>> Once merged, this should be included in the next RC. >>>>>> >>>>>> On Thu, Apr 10, 2025 at 5:21 PM Wenchen Fan <cloud0...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Please vote on releasing the following candidate as Apache Spark >>>>>>> version 4.0.0. >>>>>>> >>>>>>> The vote is open until April 15 (PST) and passes if a majority +1 >>>>>>> PMC votes are cast, with a minimum of 3 +1 votes. >>>>>>> >>>>>>> [ ] +1 Release this package as Apache Spark 4.0.0 >>>>>>> [ ] -1 Do not release this package because ... >>>>>>> >>>>>>> To learn more about Apache Spark, please see >>>>>>> https://spark.apache.org/ >>>>>>> >>>>>>> The tag to be voted on is v4.0.0-rc4 (commit >>>>>>> e0801d9d8e33cd8835f3e3beed99a3588c16b776) >>>>>>> https://github.com/apache/spark/tree/v4.0.0-rc4 >>>>>>> >>>>>>> The release files, including signatures, digests, etc. can be found >>>>>>> at: >>>>>>> https://dist.apache.org/repos/dist/dev/spark/v4.0.0-rc4-bin/ >>>>>>> >>>>>>> Signatures used for Spark RCs can be found in this file: >>>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS >>>>>>> >>>>>>> The staging repository for this release can be found at: >>>>>>> >>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1480/ >>>>>>> >>>>>>> The documentation corresponding to this release can be found at: >>>>>>> https://dist.apache.org/repos/dist/dev/spark/v4.0.0-rc4-docs/ >>>>>>> >>>>>>> The list of bug fixes going into 4.0.0 can be found at the following >>>>>>> URL: >>>>>>> https://issues.apache.org/jira/projects/SPARK/versions/12353359 >>>>>>> >>>>>>> This release is using the release script of the tag v4.0.0-rc4. >>>>>>> >>>>>>> FAQ >>>>>>> >>>>>>> ========================= >>>>>>> How can I help test this release? >>>>>>> ========================= >>>>>>> >>>>>>> If you are a Spark user, you can help us test this release by taking >>>>>>> an existing Spark workload and running on this release candidate, >>>>>>> then >>>>>>> reporting any regressions. >>>>>>> >>>>>>> If you're working in PySpark you can set up a virtual env and install >>>>>>> the current RC and see if anything important breaks, in the >>>>>>> Java/Scala >>>>>>> you can add the staging repository to your projects resolvers and >>>>>>> test >>>>>>> with the RC (make sure to clean up the artifact cache before/after so >>>>>>> you don't end up building with a out of date RC going forward). >>>>>>> >>>>>> >>> >>> >