Made a PR first (https://github.com/apache/spark/pull/50378).

BTW, I agree that we should have the source code along with the jars, and
ideally the dev branch should not contain them as well. This is a
technical depth.
For this, I hope we can improve this incrementally.

I will also take a look and see if we can reject jars automatically in PRs
or CI.


On Tue, 25 Mar 2025 at 11:15, Hyukjin Kwon <gurwls...@apache.org> wrote:

> So the issues are source releases (https://github.com/apache/spark/tags)
> containing those jars, right? Can we add the removal of test jars at the
> part of the release process.
>
> They aren't included in binary releases in any event so removal on every
> source release should work.
>
> On Tue, 25 Mar 2025 at 10:51, Jungtaek Lim <kabhwan.opensou...@gmail.com>
> wrote:
>
>> Let's make this very clear - do we not have a source code to build a jar,
>> or have no way to infer the source code being used for the jar?
>>
>> I understand the concern, but if this is a huge issue, why no one has
>> looked into this and here we just debate whether the affected tests need to
>> be dropped/disabled or not? Whenever we add some test resources like a
>> golden file, we tend to leave the part of the code to build the golden
>> file. Did we check and confirm these jars are not the case and we lost the
>> source code to build?
>>
>> On Tue, Mar 25, 2025 at 9:35 AM Rozov, Vlad <vro...@amazon.com.invalid>
>> wrote:
>>
>>> First of all I don’t think that conclusion on the
>>> https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k is
>>> correct. Jar files included into the source release are compiled from the
>>> code and replacing them with dat or jpeg files won’t work. Including jar
>>> files into the source release is against ASF policy and my -1 will stay as
>>> long as jars are included into the source release. As this issue was raised
>>> not for the first time and there was no action (actually more jars were
>>> added), IMO, the issue should now be handled as the release blocker.
>>>
>>> I don’t see anything in the proposal that suggests that fix
>>> for SPARK-51318 is or should be blocked by umbrella JIRA. The proposal was
>>> to recover tests one by one. The PR that I have open will allow to
>>> accomplish these tasks as all disabled tests refer to SPARK-51318.
>>>
>>> I can only help with SPARK-51318 at this point. Somebody else will have
>>> to look into keeping tests enabled as it requires source code for the test
>>> jars.
>>>
>>> Thank you,
>>>
>>> Vlad
>>>
>>>
>>> On Mar 24, 2025, at 4:55 PM, Hyukjin Kwon <gurwls...@apache.org> wrote:
>>>
>>> I still disagree with just disabling tests and removing the jars without
>>> making sure that we will enable them back.
>>> I want to EITHER make sure we have a plan and someone to drive, and the
>>> tests will be enabled back, OR have a one fix that does all.
>>> Otherwise, my -1 stands if we can't be sure of that.
>>>
>>> On Tue, 25 Mar 2025 at 08:51, Hyukjin Kwon <gurwls...@apache.org> wrote:
>>>
>>>> From what I read in the last discussion in the legal thread (
>>>> https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k), we
>>>> don't really need to rush and block the release.
>>>> I don't think we should block the release, remove the CI, and just
>>>> remove the jars.
>>>>
>>>> Rozov, the original proposal of this thread is 1. to first disable the
>>>> tests, and 2. open an umbrella JIRA to enable individual tests.
>>>> Since you're driving this, would you mind either making a proper fix in
>>>> one go, or create an umbrella JIRA to drive this?
>>>>
>>>>
>>>> On Mon, 24 Mar 2025 at 23:46, Rozov, Vlad <vro...@amazon.com.invalid>
>>>> wrote:
>>>>
>>>>> Let’s open a formal vote on the subject. I have open WIP PR
>>>>> https://github.com/apache/spark/pull/50231 that is currently blocked
>>>>> by -1.
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Vlad
>>>>>
>>>>> On Mar 24, 2025, at 7:05 AM, Wenchen Fan <cloud0...@gmail.com> wrote:
>>>>>
>>>>>
>>>>> It seems there’s no quick fix for this issue. Should we remove these
>>>>> jars and disable the tests for now to comply with ASF policy? While this
>>>>> would temporarily reduce test coverage until we refactor the tests to 
>>>>> avoid
>>>>> pre-compiled jars, we can encourage Spark vendors not to cherry-pick this
>>>>> test-disabling commit so they can help report any test failures. That 
>>>>> said,
>>>>> since these tests are quite old and stable, failures are unlikely.
>>>>>
>>>>> Thanks,
>>>>> Wenchen
>>>>>
>>>>> On Thu, Mar 13, 2025 at 12:15 AM Rozov, Vlad <vro...@amazon.com.invalid>
>>>>> wrote:
>>>>>
>>>>>> There is a difference between technical debt and legal issue. ASF may
>>>>>> request to pull out release that does not meet ASF policy (and having 
>>>>>> tests
>>>>>> is not ASF policy). IMO, SPARK-51318 should be a blocker for the next
>>>>>> release or handled like a blocker.
>>>>>>
>>>>>> Thank you,
>>>>>>
>>>>>> Vlad
>>>>>>
>>>>>> On Mar 10, 2025, at 6:02 PM, Jungtaek Lim <
>>>>>> kabhwan.opensou...@gmail.com> wrote:
>>>>>>
>>>>>> +1 to Hyukjin. If the test is effective, we should definitely retain
>>>>>> the effectiveness of the test, unless we end up with the conclusion that
>>>>>> there is no way to do that.
>>>>>>
>>>>>> On Tue, Mar 11, 2025 at 9:29 AM Hyukjin Kwon <gurwls...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> If we should fix, let's make sure we don't just disable the tests -
>>>>>>> we will create another set of technical debt.
>>>>>>>
>>>>>>>
>>>>>>> On Thu, 27 Feb 2025 at 09:11, Rozov, Vlad <vro...@amazon.com.invalid>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I’ll look into the JIRA. Please assign it to me.
>>>>>>>>
>>>>>>>> Thank you,
>>>>>>>>
>>>>>>>> Vlad
>>>>>>>>
>>>>>>>> > On Feb 26, 2025, at 11:33 PM, Yang Jie <yangji...@apache.org>
>>>>>>>> wrote:
>>>>>>>> >
>>>>>>>> > +1, Agree to remove the jar files from the Apache Spark
>>>>>>>> repository and disable the affected tests.
>>>>>>>> >
>>>>>>>> > For the current test scenarios that use jar files, I believe we
>>>>>>>> can definitely find a more reasonable testing approach.
>>>>>>>> >
>>>>>>>> > Thanks,
>>>>>>>> > Jie Yang
>>>>>>>> >
>>>>>>>> > On 2025/02/26 16:57:45 "Rozov, Vlad" wrote:
>>>>>>>> >> +1 on fixing test jars, though the way how it is fixed needs to
>>>>>>>> be discussed, IMO. In the short term removing jars may still be the 
>>>>>>>> best
>>>>>>>> option to satisfy ASF legal policy and avoid release removal.
>>>>>>>> >>
>>>>>>>> >> AFAIK, ASF mandates that users and developers have source code
>>>>>>>> that they build from (source release), not that they run (binary 
>>>>>>>> release).
>>>>>>>> >>
>>>>>>>> >> Thank you,
>>>>>>>> >>
>>>>>>>> >> Vlad
>>>>>>>> >>
>>>>>>>> >>> On Feb 26, 2025, at 8:47 AM, Dongjoon Hyun <dongj...@apache.org>
>>>>>>>> wrote:
>>>>>>>> >>>
>>>>>>>> >>> Thank you for your reply, Sean.
>>>>>>>> >>>
>>>>>>>> >>> I expected that argument exactly so that I started by quoting
>>>>>>>> your sentence in the above.
>>>>>>>> >>>
>>>>>>>> >>> I understood the reasoning in 2018. However, there are two
>>>>>>>> reasons why I brought this again in 2025:
>>>>>>>> >>>
>>>>>>>> >>> First, the open source sprit is technically and literally "no
>>>>>>>> compiled code in a source release" like Apache Hadoop and Hive 
>>>>>>>> community
>>>>>>>> does. Justin, Vlad, and Alex shared the same perspective to the Apache
>>>>>>>> Spark PMC.
>>>>>>>> >>>
>>>>>>>> >>> $ tar tvf apache-hive-4.0.1-src.tar.gz | grep 'jar$' | wc -l
>>>>>>>> >>>      0
>>>>>>>> >>> $ tar tvfz hadoop-3.4.1-src.tar.gz | grep 'jar$' | wc -l
>>>>>>>> >>>      0
>>>>>>>> >>>
>>>>>>>> >>> Second, last year, the open source communities were hit by
>>>>>>>> CVE-2024-3094 ("XZ Utils Backdoor") in the world-wide manner where the
>>>>>>>> backdoor was hidden in the test object. I believe most of us are aware 
>>>>>>>> of
>>>>>>>> that. At that time, the GitHub repository was disabled. As a member of
>>>>>>>> Apache Spark PMC, I'm suggesting to remove that risk from the Apache 
>>>>>>>> Spark
>>>>>>>> repository in 2025. I attached the following link to provide the XZ 
>>>>>>>> Utils
>>>>>>>> history explicitly.
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> https://www.akamai.com/blog/security-research/critical-linux-backdoor-xz-utils-discovered-what-to-know
>>>>>>>> >>>
>>>>>>>> >>> Although I agree that those test coverages are important, I
>>>>>>>> don't think that's worthy for Apache Spark community to take a risk to 
>>>>>>>> be
>>>>>>>> shutdown. That's the lesson which I've learned last year.
>>>>>>>> >>>
>>>>>>>> >>> Sincerely,
>>>>>>>> >>> Dongjoon.
>>>>>>>> >>>
>>>>>>>> >>> On 2025/02/26 13:31:56 Sean Owen wrote:
>>>>>>>> >>>> The gist of the initial 2018 thread was:
>>>>>>>> >>>> These are not source .jar files that users use, but .jar files
>>>>>>>> used to test
>>>>>>>> >>>> loading of from .jar files. These are test resources only.
>>>>>>>> >>>> I don't think this is what the spirit of the rule is speaking
>>>>>>>> to, that the
>>>>>>>> >>>> end-user code should always have source code, which is the
>>>>>>>> right principle.
>>>>>>>> >>>> Checking in the code somewhere is nice to have though and I
>>>>>>>> think that was
>>>>>>>> >>>> the idea here.
>>>>>>>> >>>>
>>>>>>>> >>>> But, removing these and disabling potentially valuable tests
>>>>>>>> seems like a
>>>>>>>> >>>> step too far. There is no actual 'problem' w.r.t. the
>>>>>>>> principle that users
>>>>>>>> >>>> have source to the code they run.
>>>>>>>> >>>>
>>>>>>>> >>>> The 2025 thread just retreads the same ground as the 2018
>>>>>>>> thread.
>>>>>>>> >>>> But I don't see that we put this argument to the person who
>>>>>>>> raised it
>>>>>>>> >>>> again. Why not that first?
>>>>>>>> >>>> And, if possible, go stick the source to these jars in the
>>>>>>>> source tree,
>>>>>>>> >>>> where available.
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>> On Wed, Feb 26, 2025 at 1:08 AM Dongjoon Hyun <
>>>>>>>> dongjoon.h...@gmail.com>
>>>>>>>> >>>> wrote:
>>>>>>>> >>>>
>>>>>>>> >>>>> Hi, All.
>>>>>>>> >>>>>
>>>>>>>> >>>>> Unfortunately, the Apache Spark project seems to have a
>>>>>>>> technical debt in
>>>>>>>> >>>>> the source code releases. It happens to be discussed at least
>>>>>>>> twice on both
>>>>>>>> >>>>> dev@spark and legal-discuss mailing lists. (Thank you for
>>>>>>>> the head-up,
>>>>>>>> >>>>> Vlad.)
>>>>>>>> >>>>>
>>>>>>>> >>>>> 1.
>>>>>>>> https://lists.apache.org/thread/3sxw9gwp51mrkzlo2xchq1g20gbgbnz8
>>>>>>>> >>>>> (2018-06-21, dev@spark)
>>>>>>>> >>>>> 2.
>>>>>>>> https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k
>>>>>>>> >>>>> (2018-06-25, legal-discuss@)
>>>>>>>> >>>>> 3.
>>>>>>>> https://lists.apache.org/thread/z3oq1db80vc8c7r6892hwjnq4h7hnwmd
>>>>>>>> >>>>> (2025-02-25, dev@spark)
>>>>>>>> >>>>>
>>>>>>>> >>>>> To be short, according to the previous conclusion in 2018,
>>>>>>>> the Apache
>>>>>>>> >>>>> Spark community wanted to adhere to the ASF policy by
>>>>>>>> removing those jar
>>>>>>>> >>>>> files from source code releases (although it was not
>>>>>>>> considered as a
>>>>>>>> >>>>> release blocker at that time and until now).
>>>>>>>> >>>>>
>>>>>>>> >>>>>> it's important to be able to recreate these JARs somehow,
>>>>>>>> >>>>>> and I don't think we have the source in the repo for all of
>>>>>>>> them
>>>>>>>> >>>>>> (at least, the ones that originate from Spark).
>>>>>>>> >>>>>> That much seems like a must-do. After that, seems worth
>>>>>>>> figuring out
>>>>>>>> >>>>>> just how hard it is to build these artifacts from source.
>>>>>>>> >>>>>> If it's easy, great. If not, either the test can be removed
>>>>>>>> or
>>>>>>>> >>>>>> we figure out just how hard a requirement this is.
>>>>>>>> >>>>>
>>>>>>>> >>>>> Given the unresolved issue for seven years, I proposed
>>>>>>>> SPARK-51318 as a
>>>>>>>> >>>>> potential solution to comply with ASF policy. After
>>>>>>>> SPARK-51318, we can
>>>>>>>> >>>>> recover the test coverage one by one later by addressing IDed
>>>>>>>> TODO items
>>>>>>>> >>>>> without any legal concerns during the votes.
>>>>>>>> >>>>>
>>>>>>>> >>>>> https://issues.apache.org/jira/browse/SPARK-51318
>>>>>>>> >>>>> (Remove `jar` files from Apache Spark repository and disable
>>>>>>>> affected
>>>>>>>> >>>>> tests)
>>>>>>>> >>>>>
>>>>>>>> >>>>> WDYT?
>>>>>>>> >>>>>
>>>>>>>> >>>>> BTW, please note that I didn't define SPARK-51318 as a
>>>>>>>> blocker for any
>>>>>>>> >>>>> on-going releases yet.
>>>>>>>> >>>>>
>>>>>>>> >>>>> Best regards,
>>>>>>>> >>>>> Dongjoon.
>>>>>>>> >>>>>
>>>>>>>> >>>>
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>>> >>>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >
>>>>>>>> >
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>>> >
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>

Reply via email to