" *- distributing libraries with CVE is not a good development practice*" This version of spark is only a minor upgrade of a maintained branch and we have a newer release - 4.0 now for users that need that.
For some time ago I updated FasterXML jackson to fix one CVE https://github.com/apache/spark/pull/40933 It took almost a year before some have to change it, because it break there system https://github.com/apache/spark/pull/49163 So I recommend not to bump the version of Parquet If users need a newer version of Parquet they can upgrade spark. ons. 28. mai 2025 kl. 04:47 skrev Rozov, Vlad <vro...@amazon.com.invalid>: > I’ll go with the community vote. > > My take: > > - the backport is already available, so work was already done (if the > issue is to open PR with the backport, I can help with that) > - there is no downside of upgrading parquet dependency to 1.15.2 as 4.0.0 > uses upgraded dependency > - between 1.13 and 1.15 there are bug fixes that Spark users will benefit > from. I guess that the similar argument applies to ORC upgrade ( > https://github.com/apache/spark/pull/50813). > - there are confirmed performance improvements that directly impacts Spark > - 4.0.0 was just released and it will take some time before it is fully > tested and adopted for production deployments > - distributing libraries with CVE is not a good development practice > > Thank you, > > Vlad > > On May 27, 2025, at 4:21 PM, Hyukjin Kwon <gurwls...@apache.org> wrote: > > I am fine with backporting if we know that the CVEs actually affect Spark. > Let's check if one of CVEs actually affects Spark, and create a backport if > so. > For improvements, it is generally not backported down to old branches > > On Wed, 28 May 2025 at 01:17, Rozov, Vlad <vro...@amazon.com.invalid> > wrote: > >> Hi Dongjoon, >> >> > I guess you wanted to propose Apache Parquet 1.5.2 backport instead. >> Correct, that was my question: "Should parquet version be upgraded to >> 1.15.1 or 1.15.2? There are 10 CVEs in the current 1.13.1 and even though >> they may not impact Spark there are other improvements (better performance) >> that will benefit Spark users.” >> >> IMO, it will be beneficial for Spark 3.5.x users to have parquet >> dependency upgraded to 1.15.2. Even if Spark is not directly impacted by >> the Parquet CVE-2025-46762 and CVE-2025-30065, as Spark distribution >> installs vulnerable libraries, it may trigger scanner alerts on end user >> systems. >> >> Should your CR be backported to 3.5 branch and included into the next >> 3.5.7 release? Another option was to undo the revert and bump parquet >> version from 1.15.1 to 1.15.2. >> >> Thank you, >> >> Vlad >> >> > On May 26, 2025, at 9:16 AM, Dongjoon Hyun <dongj...@apache.org> wrote: >> > >> > To Vlad. This is not correct. >> > >> >> the revert can now be undone. >> > >> > FYI, Parquet 1.5.1 was reverted not only for the deadlock report, but >> also I got informed that 1.5.1 was turned out to be insufficient and 1.5.2 >> was in progress in the Apache Parquet community. >> > >> > I guess you wanted to propose Apache Parquet 1.5.2 backport instead. >> For the record, I made both the revert commit and the following SPARK-51950 >> PR. As of now, I don't see any valid reason of reverting of the reverted >> commit of 1.5.1. >> > >> > [SPARK-51950][BUILD] Upgrade Parquet to 1.15.2 >> > https://github.com/apache/spark/pull/50755 >> > >> > Dongjoon. >> > >> > On 2025/05/26 03:08:32 "Rozov, Vlad" wrote: >> >> There is an existing PR that was reverted due to a deadlock. As >> deadlock is now fixed, the revert can now be undone. >> >> >> >> https://github.com/apache/spark/pull/50528 >> >> >> https://github.com/apache/spark/commit/eb6cc4c9ee17406cd665991489b6619f5c7689ab >> >> https://github.com/apache/spark/pull/50810 >> >> >> >> Thank you, >> >> >> >> Vlad >> >> >> >> On May 25, 2025, at 6:05 PM, Hyukjin Kwon <gurwls...@apache.org> >> wrote: >> >> >> >> Probably should avoid backporting it for improvements but If there is >> a CVE that directly affects Spark, let's upgrade. >> >> >> >> On Mon, 26 May 2025 at 00:27, Rozov, Vlad <vro...@amazon.com.invalid> >> wrote: >> >> Should parquet version be upgraded to 1.15.1 or 1.15.2? There are 10 >> CVEs in the current 1.13.1 and even though they may not impact Spark there >> are other improvements (better performance) that will benefit Spark users. >> >> >> >> Thank you, >> >> >> >> Vlad >> >> >> >> On May 24, 2025, at 8:02 PM, Hyukjin Kwon <gurwls...@apache.org >> <mailto:gurwls...@apache.org>> wrote: >> >> >> >> Oh let me check. Thanks for letting me know. >> >> >> >> On Sun, May 25, 2025 at 12:00 PM Dongjoon Hyun <dongj...@apache.org >> <mailto:dongj...@apache.org>> wrote: >> >> I saw 38 commits to make this work. Thank you for driving this, >> Hyukjin. >> >> >> >> BTW, your key seems to be new and is not in >> https://dist.apache.org/repos/dist/dev/spark/KEYS yet. Could you >> double-check? >> >> >> >> $ curl -LO https://dist.apache.org/repos/dist/dev/spark/KEYS >> >> $ gpg --import KEYS >> >> $ gpg --verify spark-3.5.6-bin-hadoop3.tgz.asc >> >> gpg: assuming signed data in 'spark-3.5.6-bin-hadoop3.tgz' >> >> gpg: Signature made Thu May 22 23:49:54 2025 PDT >> >> gpg: using RSA key >> 0FE4571297AB84440673665669600C8338F65970 >> >> gpg: issuer "gurwls...@apache.org<mailto: >> gurwls...@apache.org>" >> >> gpg: Can't check signature: No public key >> >> >> >> Dongjoon. >> >> >> >> On 2025/05/23 17:56:25 Allison Wang wrote: >> >>> +1 >> >>> >> >>> On Fri, May 23, 2025 at 10:15 AM Hyukjin Kwon <gurwls...@apache.org >> <mailto:gurwls...@apache.org>> wrote: >> >>> >> >>>> Oh it's actually a test and also to release. Let me know if you have >> any >> >>>> concern! >> >>>> >> >>>> On Fri, May 23, 2025 at 11:25 PM Mridul Muralidharan < >> mri...@gmail.com<mailto:mri...@gmail.com>> >> >>>> wrote: >> >>>> >> >>>>> Hi Hyukjin, >> >>>>> >> >>>>> This thread is to test the automated release, right ? >> >>>>> Not to actually release it ? >> >>>>> >> >>>>> Regards, >> >>>>> Mridul >> >>>>> >> >>>>> On Fri, May 23, 2025 at 8:26 AM Ruifeng Zheng <ruife...@apache.org >> <mailto:ruife...@apache.org>> >> >>>>> wrote: >> >>>>> >> >>>>>> +1 >> >>>>>> >> >>>>>> On Fri, May 23, 2025 at 5:27 PM Hyukjin Kwon <gurwls...@apache.org >> <mailto:gurwls...@apache.org>> >> >>>>>> wrote: >> >>>>>> >> >>>>>>> Please vote on releasing the following candidate as Apache Spark >> >>>>>>> version 3.5.6. >> >>>>>>> >> >>>>>>> The vote is open until May 27 (PST) and passes if a majority +1 >> PMC >> >>>>>>> votes are cast, with >> >>>>>>> a minimum of 3 +1 votes. >> >>>>>>> >> >>>>>>> [ ] +1 Release this package as Apache Spark 3.5.6 >> >>>>>>> [ ] -1 Do not release this package because ... >> >>>>>>> >> >>>>>>> To learn more about Apache Spark, please see >> https://spark.apache.org/ >> >>>>>>> >> >>>>>>> The tag to be voted on is v3.5.6-rc5 (commit >> >>>>>>> 303c18c74664f161b9b969ac343784c088b47593): >> >>>>>>> >> >>>>>>> >> https://github.com/apache/spark/tree/303c18c74664f161b9b969ac343784c088b47593 >> >>>>>>> >> >>>>>>> The release files, including signatures, digests, etc. can be >> found at: >> >>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.5.6-rc1-bin/ >> >>>>>>> >> >>>>>>> Signatures used for Spark RCs can be found in this file: >> >>>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS >> >>>>>>> >> >>>>>>> The staging repository for this release can be found at: >> >>>>>>> >> https://repository.apache.org/content/repositories/orgapachespark-1495/ >> >>>>>>> >> >>>>>>> The documentation corresponding to this release can be found at: >> >>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.5.6-rc1-docs/ >> >>>>>>> >> >>>>>>> The list of bug fixes going into 3.5.6 can be found at the >> following >> >>>>>>> URL: >> >>>>>>> https://issues.apache.org/jira/projects/SPARK/versions/12355703 >> >>>>>>> >> >>>>>>> FAQ >> >>>>>>> >> >>>>>>> ========================= >> >>>>>>> How can I help test this release? >> >>>>>>> ========================= >> >>>>>>> >> >>>>>>> If you are a Spark user, you can help us test this release by >> taking >> >>>>>>> an existing Spark workload and running on this release candidate, >> then >> >>>>>>> reporting any regressions. >> >>>>>>> >> >>>>>>> If you're working in PySpark you can set up a virtual env and >> install >> >>>>>>> the current RC via "pip install >> >>>>>>> >> https://dist.apache.org/repos/dist/dev/spark/v3.5.6-rc1-bin/pyspark-3.5.6.tar.gz >> >>>>>>> " >> >>>>>>> and see if anything important breaks. >> >>>>>>> In the Java/Scala, you can add the staging repository to your >> projects >> >>>>>>> resolvers and test >> >>>>>>> with the RC (make sure to clean up the artifact cache >> before/after so >> >>>>>>> you don't end up building with a out of date RC going forward). >> >>>>>>> >> >>>>>> >> >>> >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org<mailto: >> dev-unsubscr...@spark.apache.org> >> >> >> >> >> >> >> >> >> > >> > --------------------------------------------------------------------- >> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> > >> >> > -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297