Let's not couple the performance improvement here from upgrading the
library in the bugfix version of Apache Spark. The major focus on bugfix
versions should be something users can upgrade to gain more safety and
reliability - upgrading and downgrading Spark could be painful on the setup
env and it's not really a great user experience to let them downgrade just
because we want better performance but didn't indicate the risk. Your case
is not the same as the example - the example you mentioned just bumped the
bugfix version, not the minor version.

So this is really about CVE. If the CVE is critical enough, we may need to
perform the most conservative upgrade, minimal version which contains the
fix. If this requires bumping a minor version, maybe we'd like to know
whether it is a closed door to make a fix to that version line (explicitly
EOLed?).



On Wed, May 28, 2025 at 11:46 AM Rozov, Vlad <vro...@amazon.com.invalid>
wrote:

> I’ll go with the community vote.
>
> My take:
>
> - the backport is already available, so work was already done (if the
> issue is to open PR with the backport, I can help with that)
> - there is no downside of upgrading parquet dependency to 1.15.2 as 4.0.0
> uses upgraded dependency
> - between 1.13 and 1.15 there are bug fixes that Spark users will benefit
> from. I guess that the similar argument applies to ORC upgrade (
> https://github.com/apache/spark/pull/50813).
> - there are confirmed performance improvements that directly impacts Spark
> - 4.0.0 was just released and it will take some time before it is fully
> tested and adopted for production deployments
> - distributing libraries with CVE is not a good development practice
>
> Thank you,
>
> Vlad
>
> On May 27, 2025, at 4:21 PM, Hyukjin Kwon <gurwls...@apache.org> wrote:
>
> I am fine with backporting if we know that the CVEs actually affect Spark.
> Let's check if one of CVEs actually affects Spark, and create a backport if
> so.
> For improvements, it is generally not backported down to old branches
>
> On Wed, 28 May 2025 at 01:17, Rozov, Vlad <vro...@amazon.com.invalid>
> wrote:
>
>> Hi Dongjoon,
>>
>> > I guess you wanted to propose Apache Parquet 1.5.2 backport instead.
>> Correct, that was my question: "Should parquet version be upgraded to
>> 1.15.1 or 1.15.2? There are 10 CVEs in the current 1.13.1 and even though
>> they may not impact Spark there are other improvements (better performance)
>> that will benefit Spark users.”
>>
>> IMO, it will be beneficial for Spark 3.5.x users to have parquet
>> dependency upgraded to 1.15.2. Even if Spark is not directly impacted by
>> the Parquet CVE-2025-46762 and CVE-2025-30065, as Spark distribution
>> installs vulnerable libraries, it may trigger scanner alerts on end user
>> systems.
>>
>> Should your CR be backported to 3.5 branch and included into the next
>> 3.5.7 release? Another option was to undo the revert and bump parquet
>> version from 1.15.1 to 1.15.2.
>>
>> Thank you,
>>
>> Vlad
>>
>> > On May 26, 2025, at 9:16 AM, Dongjoon Hyun <dongj...@apache.org> wrote:
>> >
>> > To Vlad. This is not correct.
>> >
>> >> the revert can now be undone.
>> >
>> > FYI, Parquet 1.5.1 was reverted not only for the deadlock report, but
>> also I got informed that 1.5.1 was turned out to be insufficient and 1.5.2
>> was in progress in the Apache Parquet community.
>> >
>> > I guess you wanted to propose Apache Parquet 1.5.2 backport instead.
>> For the record, I made both the revert commit and the following SPARK-51950
>> PR. As of now, I don't see any valid reason of reverting of the reverted
>> commit of 1.5.1.
>> >
>> > [SPARK-51950][BUILD] Upgrade Parquet to 1.15.2
>> > https://github.com/apache/spark/pull/50755
>> >
>> > Dongjoon.
>> >
>> > On 2025/05/26 03:08:32 "Rozov, Vlad" wrote:
>> >> There is an existing PR that was reverted due to a deadlock. As
>> deadlock is now fixed, the revert can now be undone.
>> >>
>> >> https://github.com/apache/spark/pull/50528
>> >>
>> https://github.com/apache/spark/commit/eb6cc4c9ee17406cd665991489b6619f5c7689ab
>> >> https://github.com/apache/spark/pull/50810
>> >>
>> >> Thank you,
>> >>
>> >> Vlad
>> >>
>> >> On May 25, 2025, at 6:05 PM, Hyukjin Kwon <gurwls...@apache.org>
>> wrote:
>> >>
>> >> Probably should avoid backporting it for improvements but If there is
>> a CVE that directly affects Spark, let's upgrade.
>> >>
>> >> On Mon, 26 May 2025 at 00:27, Rozov, Vlad <vro...@amazon.com.invalid>
>> wrote:
>> >> Should parquet version be upgraded to 1.15.1 or 1.15.2? There are 10
>> CVEs in the current 1.13.1 and even though they may not impact Spark there
>> are other improvements (better performance) that will benefit Spark users.
>> >>
>> >> Thank you,
>> >>
>> >> Vlad
>> >>
>> >> On May 24, 2025, at 8:02 PM, Hyukjin Kwon <gurwls...@apache.org
>> <mailto:gurwls...@apache.org>> wrote:
>> >>
>> >> Oh let me check. Thanks for letting me know.
>> >>
>> >> On Sun, May 25, 2025 at 12:00 PM Dongjoon Hyun <dongj...@apache.org
>> <mailto:dongj...@apache.org>> wrote:
>> >> I saw 38 commits to make this work. Thank you for driving this,
>> Hyukjin.
>> >>
>> >> BTW, your key seems to be new and is not in
>> https://dist.apache.org/repos/dist/dev/spark/KEYS yet. Could you
>> double-check?
>> >>
>> >> $ curl -LO https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >> $ gpg --import KEYS
>> >> $ gpg --verify spark-3.5.6-bin-hadoop3.tgz.asc
>> >> gpg: assuming signed data in 'spark-3.5.6-bin-hadoop3.tgz'
>> >> gpg: Signature made Thu May 22 23:49:54 2025 PDT
>> >> gpg:                using RSA key
>> 0FE4571297AB84440673665669600C8338F65970
>> >> gpg:                issuer "gurwls...@apache.org<mailto:
>> gurwls...@apache.org>"
>> >> gpg: Can't check signature: No public key
>> >>
>> >> Dongjoon.
>> >>
>> >> On 2025/05/23 17:56:25 Allison Wang wrote:
>> >>> +1
>> >>>
>> >>> On Fri, May 23, 2025 at 10:15 AM Hyukjin Kwon <gurwls...@apache.org
>> <mailto:gurwls...@apache.org>> wrote:
>> >>>
>> >>>> Oh it's actually a test and also to release. Let me know if you have
>> any
>> >>>> concern!
>> >>>>
>> >>>> On Fri, May 23, 2025 at 11:25 PM Mridul Muralidharan <
>> mri...@gmail.com<mailto:mri...@gmail.com>>
>> >>>> wrote:
>> >>>>
>> >>>>> Hi Hyukjin,
>> >>>>>
>> >>>>>  This thread is to test the automated release, right ?
>> >>>>> Not to actually release it ?
>> >>>>>
>> >>>>> Regards,
>> >>>>> Mridul
>> >>>>>
>> >>>>> On Fri, May 23, 2025 at 8:26 AM Ruifeng Zheng <ruife...@apache.org
>> <mailto:ruife...@apache.org>>
>> >>>>> wrote:
>> >>>>>
>> >>>>>> +1
>> >>>>>>
>> >>>>>> On Fri, May 23, 2025 at 5:27 PM Hyukjin Kwon <gurwls...@apache.org
>> <mailto:gurwls...@apache.org>>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>>> Please vote on releasing the following candidate as Apache Spark
>> >>>>>>> version 3.5.6.
>> >>>>>>>
>> >>>>>>> The vote is open until May 27 (PST)  and passes if a majority +1
>> PMC
>> >>>>>>> votes are cast, with
>> >>>>>>> a minimum of 3 +1 votes.
>> >>>>>>>
>> >>>>>>> [ ] +1 Release this package as Apache Spark 3.5.6
>> >>>>>>> [ ] -1 Do not release this package because ...
>> >>>>>>>
>> >>>>>>> To learn more about Apache Spark, please see
>> https://spark.apache.org/
>> >>>>>>>
>> >>>>>>> The tag to be voted on is v3.5.6-rc5 (commit
>> >>>>>>> 303c18c74664f161b9b969ac343784c088b47593):
>> >>>>>>>
>> >>>>>>>
>> https://github.com/apache/spark/tree/303c18c74664f161b9b969ac343784c088b47593
>> >>>>>>>
>> >>>>>>> The release files, including signatures, digests, etc. can be
>> found at:
>> >>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.5.6-rc1-bin/
>> >>>>>>>
>> >>>>>>> Signatures used for Spark RCs can be found in this file:
>> >>>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >>>>>>>
>> >>>>>>> The staging repository for this release can be found at:
>> >>>>>>>
>> https://repository.apache.org/content/repositories/orgapachespark-1495/
>> >>>>>>>
>> >>>>>>> The documentation corresponding to this release can be found at:
>> >>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.5.6-rc1-docs/
>> >>>>>>>
>> >>>>>>> The list of bug fixes going into 3.5.6 can be found at the
>> following
>> >>>>>>> URL:
>> >>>>>>> https://issues.apache.org/jira/projects/SPARK/versions/12355703
>> >>>>>>>
>> >>>>>>> FAQ
>> >>>>>>>
>> >>>>>>> =========================
>> >>>>>>> How can I help test this release?
>> >>>>>>> =========================
>> >>>>>>>
>> >>>>>>> If you are a Spark user, you can help us test this release by
>> taking
>> >>>>>>> an existing Spark workload and running on this release candidate,
>> then
>> >>>>>>> reporting any regressions.
>> >>>>>>>
>> >>>>>>> If you're working in PySpark you can set up a virtual env and
>> install
>> >>>>>>> the current RC via "pip install
>> >>>>>>>
>> https://dist.apache.org/repos/dist/dev/spark/v3.5.6-rc1-bin/pyspark-3.5.6.tar.gz
>> >>>>>>> "
>> >>>>>>> and see if anything important breaks.
>> >>>>>>> In the Java/Scala, you can add the staging repository to your
>> projects
>> >>>>>>> resolvers and test
>> >>>>>>> with the RC (make sure to clean up the artifact cache
>> before/after so
>> >>>>>>> you don't end up building with a out of date RC going forward).
>> >>>>>>>
>> >>>>>>
>> >>>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org<mailto:
>> dev-unsubscr...@spark.apache.org>
>> >>
>> >>
>> >>
>> >>
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> >
>>
>>
>

Reply via email to