Made a PR first (https://github.com/apache/spark/pull/50378).
BTW, I agree that we should have the source code along with the jars, and ideally the dev branch should not contain them as well. This is a technical depth. For this, I hope we can improve this incrementally. I will also take a look and see if we can reject jars automatically in PRs or CI. On Tue, 25 Mar 2025 at 11:15, Hyukjin Kwon <gurwls...@apache.org> wrote: > So the issues are source releases (https://github.com/apache/spark/tags) > containing those jars, right? Can we add the removal of test jars at the > part of the release process. > > They aren't included in binary releases in any event so removal on every > source release should work. > > On Tue, 25 Mar 2025 at 10:51, Jungtaek Lim <kabhwan.opensou...@gmail.com> > wrote: > >> Let's make this very clear - do we not have a source code to build a jar, >> or have no way to infer the source code being used for the jar? >> >> I understand the concern, but if this is a huge issue, why no one has >> looked into this and here we just debate whether the affected tests need to >> be dropped/disabled or not? Whenever we add some test resources like a >> golden file, we tend to leave the part of the code to build the golden >> file. Did we check and confirm these jars are not the case and we lost the >> source code to build? >> >> On Tue, Mar 25, 2025 at 9:35 AM Rozov, Vlad <vro...@amazon.com.invalid> >> wrote: >> >>> First of all I don’t think that conclusion on the >>> https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k is >>> correct. Jar files included into the source release are compiled from the >>> code and replacing them with dat or jpeg files won’t work. Including jar >>> files into the source release is against ASF policy and my -1 will stay as >>> long as jars are included into the source release. As this issue was raised >>> not for the first time and there was no action (actually more jars were >>> added), IMO, the issue should now be handled as the release blocker. >>> >>> I don’t see anything in the proposal that suggests that fix >>> for SPARK-51318 is or should be blocked by umbrella JIRA. The proposal was >>> to recover tests one by one. The PR that I have open will allow to >>> accomplish these tasks as all disabled tests refer to SPARK-51318. >>> >>> I can only help with SPARK-51318 at this point. Somebody else will have >>> to look into keeping tests enabled as it requires source code for the test >>> jars. >>> >>> Thank you, >>> >>> Vlad >>> >>> >>> On Mar 24, 2025, at 4:55 PM, Hyukjin Kwon <gurwls...@apache.org> wrote: >>> >>> I still disagree with just disabling tests and removing the jars without >>> making sure that we will enable them back. >>> I want to EITHER make sure we have a plan and someone to drive, and the >>> tests will be enabled back, OR have a one fix that does all. >>> Otherwise, my -1 stands if we can't be sure of that. >>> >>> On Tue, 25 Mar 2025 at 08:51, Hyukjin Kwon <gurwls...@apache.org> wrote: >>> >>>> From what I read in the last discussion in the legal thread ( >>>> https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k), we >>>> don't really need to rush and block the release. >>>> I don't think we should block the release, remove the CI, and just >>>> remove the jars. >>>> >>>> Rozov, the original proposal of this thread is 1. to first disable the >>>> tests, and 2. open an umbrella JIRA to enable individual tests. >>>> Since you're driving this, would you mind either making a proper fix in >>>> one go, or create an umbrella JIRA to drive this? >>>> >>>> >>>> On Mon, 24 Mar 2025 at 23:46, Rozov, Vlad <vro...@amazon.com.invalid> >>>> wrote: >>>> >>>>> Let’s open a formal vote on the subject. I have open WIP PR >>>>> https://github.com/apache/spark/pull/50231 that is currently blocked >>>>> by -1. >>>>> >>>>> Thank you, >>>>> >>>>> Vlad >>>>> >>>>> On Mar 24, 2025, at 7:05 AM, Wenchen Fan <cloud0...@gmail.com> wrote: >>>>> >>>>> >>>>> It seems there’s no quick fix for this issue. Should we remove these >>>>> jars and disable the tests for now to comply with ASF policy? While this >>>>> would temporarily reduce test coverage until we refactor the tests to >>>>> avoid >>>>> pre-compiled jars, we can encourage Spark vendors not to cherry-pick this >>>>> test-disabling commit so they can help report any test failures. That >>>>> said, >>>>> since these tests are quite old and stable, failures are unlikely. >>>>> >>>>> Thanks, >>>>> Wenchen >>>>> >>>>> On Thu, Mar 13, 2025 at 12:15 AM Rozov, Vlad <vro...@amazon.com.invalid> >>>>> wrote: >>>>> >>>>>> There is a difference between technical debt and legal issue. ASF may >>>>>> request to pull out release that does not meet ASF policy (and having >>>>>> tests >>>>>> is not ASF policy). IMO, SPARK-51318 should be a blocker for the next >>>>>> release or handled like a blocker. >>>>>> >>>>>> Thank you, >>>>>> >>>>>> Vlad >>>>>> >>>>>> On Mar 10, 2025, at 6:02 PM, Jungtaek Lim < >>>>>> kabhwan.opensou...@gmail.com> wrote: >>>>>> >>>>>> +1 to Hyukjin. If the test is effective, we should definitely retain >>>>>> the effectiveness of the test, unless we end up with the conclusion that >>>>>> there is no way to do that. >>>>>> >>>>>> On Tue, Mar 11, 2025 at 9:29 AM Hyukjin Kwon <gurwls...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> If we should fix, let's make sure we don't just disable the tests - >>>>>>> we will create another set of technical debt. >>>>>>> >>>>>>> >>>>>>> On Thu, 27 Feb 2025 at 09:11, Rozov, Vlad <vro...@amazon.com.invalid> >>>>>>> wrote: >>>>>>> >>>>>>>> I’ll look into the JIRA. Please assign it to me. >>>>>>>> >>>>>>>> Thank you, >>>>>>>> >>>>>>>> Vlad >>>>>>>> >>>>>>>> > On Feb 26, 2025, at 11:33 PM, Yang Jie <yangji...@apache.org> >>>>>>>> wrote: >>>>>>>> > >>>>>>>> > +1, Agree to remove the jar files from the Apache Spark >>>>>>>> repository and disable the affected tests. >>>>>>>> > >>>>>>>> > For the current test scenarios that use jar files, I believe we >>>>>>>> can definitely find a more reasonable testing approach. >>>>>>>> > >>>>>>>> > Thanks, >>>>>>>> > Jie Yang >>>>>>>> > >>>>>>>> > On 2025/02/26 16:57:45 "Rozov, Vlad" wrote: >>>>>>>> >> +1 on fixing test jars, though the way how it is fixed needs to >>>>>>>> be discussed, IMO. In the short term removing jars may still be the >>>>>>>> best >>>>>>>> option to satisfy ASF legal policy and avoid release removal. >>>>>>>> >> >>>>>>>> >> AFAIK, ASF mandates that users and developers have source code >>>>>>>> that they build from (source release), not that they run (binary >>>>>>>> release). >>>>>>>> >> >>>>>>>> >> Thank you, >>>>>>>> >> >>>>>>>> >> Vlad >>>>>>>> >> >>>>>>>> >>> On Feb 26, 2025, at 8:47 AM, Dongjoon Hyun <dongj...@apache.org> >>>>>>>> wrote: >>>>>>>> >>> >>>>>>>> >>> Thank you for your reply, Sean. >>>>>>>> >>> >>>>>>>> >>> I expected that argument exactly so that I started by quoting >>>>>>>> your sentence in the above. >>>>>>>> >>> >>>>>>>> >>> I understood the reasoning in 2018. However, there are two >>>>>>>> reasons why I brought this again in 2025: >>>>>>>> >>> >>>>>>>> >>> First, the open source sprit is technically and literally "no >>>>>>>> compiled code in a source release" like Apache Hadoop and Hive >>>>>>>> community >>>>>>>> does. Justin, Vlad, and Alex shared the same perspective to the Apache >>>>>>>> Spark PMC. >>>>>>>> >>> >>>>>>>> >>> $ tar tvf apache-hive-4.0.1-src.tar.gz | grep 'jar$' | wc -l >>>>>>>> >>> 0 >>>>>>>> >>> $ tar tvfz hadoop-3.4.1-src.tar.gz | grep 'jar$' | wc -l >>>>>>>> >>> 0 >>>>>>>> >>> >>>>>>>> >>> Second, last year, the open source communities were hit by >>>>>>>> CVE-2024-3094 ("XZ Utils Backdoor") in the world-wide manner where the >>>>>>>> backdoor was hidden in the test object. I believe most of us are aware >>>>>>>> of >>>>>>>> that. At that time, the GitHub repository was disabled. As a member of >>>>>>>> Apache Spark PMC, I'm suggesting to remove that risk from the Apache >>>>>>>> Spark >>>>>>>> repository in 2025. I attached the following link to provide the XZ >>>>>>>> Utils >>>>>>>> history explicitly. >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> https://www.akamai.com/blog/security-research/critical-linux-backdoor-xz-utils-discovered-what-to-know >>>>>>>> >>> >>>>>>>> >>> Although I agree that those test coverages are important, I >>>>>>>> don't think that's worthy for Apache Spark community to take a risk to >>>>>>>> be >>>>>>>> shutdown. That's the lesson which I've learned last year. >>>>>>>> >>> >>>>>>>> >>> Sincerely, >>>>>>>> >>> Dongjoon. >>>>>>>> >>> >>>>>>>> >>> On 2025/02/26 13:31:56 Sean Owen wrote: >>>>>>>> >>>> The gist of the initial 2018 thread was: >>>>>>>> >>>> These are not source .jar files that users use, but .jar files >>>>>>>> used to test >>>>>>>> >>>> loading of from .jar files. These are test resources only. >>>>>>>> >>>> I don't think this is what the spirit of the rule is speaking >>>>>>>> to, that the >>>>>>>> >>>> end-user code should always have source code, which is the >>>>>>>> right principle. >>>>>>>> >>>> Checking in the code somewhere is nice to have though and I >>>>>>>> think that was >>>>>>>> >>>> the idea here. >>>>>>>> >>>> >>>>>>>> >>>> But, removing these and disabling potentially valuable tests >>>>>>>> seems like a >>>>>>>> >>>> step too far. There is no actual 'problem' w.r.t. the >>>>>>>> principle that users >>>>>>>> >>>> have source to the code they run. >>>>>>>> >>>> >>>>>>>> >>>> The 2025 thread just retreads the same ground as the 2018 >>>>>>>> thread. >>>>>>>> >>>> But I don't see that we put this argument to the person who >>>>>>>> raised it >>>>>>>> >>>> again. Why not that first? >>>>>>>> >>>> And, if possible, go stick the source to these jars in the >>>>>>>> source tree, >>>>>>>> >>>> where available. >>>>>>>> >>>> >>>>>>>> >>>> >>>>>>>> >>>> On Wed, Feb 26, 2025 at 1:08 AM Dongjoon Hyun < >>>>>>>> dongjoon.h...@gmail.com> >>>>>>>> >>>> wrote: >>>>>>>> >>>> >>>>>>>> >>>>> Hi, All. >>>>>>>> >>>>> >>>>>>>> >>>>> Unfortunately, the Apache Spark project seems to have a >>>>>>>> technical debt in >>>>>>>> >>>>> the source code releases. It happens to be discussed at least >>>>>>>> twice on both >>>>>>>> >>>>> dev@spark and legal-discuss mailing lists. (Thank you for >>>>>>>> the head-up, >>>>>>>> >>>>> Vlad.) >>>>>>>> >>>>> >>>>>>>> >>>>> 1. >>>>>>>> https://lists.apache.org/thread/3sxw9gwp51mrkzlo2xchq1g20gbgbnz8 >>>>>>>> >>>>> (2018-06-21, dev@spark) >>>>>>>> >>>>> 2. >>>>>>>> https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k >>>>>>>> >>>>> (2018-06-25, legal-discuss@) >>>>>>>> >>>>> 3. >>>>>>>> https://lists.apache.org/thread/z3oq1db80vc8c7r6892hwjnq4h7hnwmd >>>>>>>> >>>>> (2025-02-25, dev@spark) >>>>>>>> >>>>> >>>>>>>> >>>>> To be short, according to the previous conclusion in 2018, >>>>>>>> the Apache >>>>>>>> >>>>> Spark community wanted to adhere to the ASF policy by >>>>>>>> removing those jar >>>>>>>> >>>>> files from source code releases (although it was not >>>>>>>> considered as a >>>>>>>> >>>>> release blocker at that time and until now). >>>>>>>> >>>>> >>>>>>>> >>>>>> it's important to be able to recreate these JARs somehow, >>>>>>>> >>>>>> and I don't think we have the source in the repo for all of >>>>>>>> them >>>>>>>> >>>>>> (at least, the ones that originate from Spark). >>>>>>>> >>>>>> That much seems like a must-do. After that, seems worth >>>>>>>> figuring out >>>>>>>> >>>>>> just how hard it is to build these artifacts from source. >>>>>>>> >>>>>> If it's easy, great. If not, either the test can be removed >>>>>>>> or >>>>>>>> >>>>>> we figure out just how hard a requirement this is. >>>>>>>> >>>>> >>>>>>>> >>>>> Given the unresolved issue for seven years, I proposed >>>>>>>> SPARK-51318 as a >>>>>>>> >>>>> potential solution to comply with ASF policy. After >>>>>>>> SPARK-51318, we can >>>>>>>> >>>>> recover the test coverage one by one later by addressing IDed >>>>>>>> TODO items >>>>>>>> >>>>> without any legal concerns during the votes. >>>>>>>> >>>>> >>>>>>>> >>>>> https://issues.apache.org/jira/browse/SPARK-51318 >>>>>>>> >>>>> (Remove `jar` files from Apache Spark repository and disable >>>>>>>> affected >>>>>>>> >>>>> tests) >>>>>>>> >>>>> >>>>>>>> >>>>> WDYT? >>>>>>>> >>>>> >>>>>>>> >>>>> BTW, please note that I didn't define SPARK-51318 as a >>>>>>>> blocker for any >>>>>>>> >>>>> on-going releases yet. >>>>>>>> >>>>> >>>>>>>> >>>>> Best regards, >>>>>>>> >>>>> Dongjoon. >>>>>>>> >>>>> >>>>>>>> >>>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>>> >>> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>>> >> >>>>>>>> >> >>>>>>>> > >>>>>>>> > >>>>>>>> --------------------------------------------------------------------- >>>>>>>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>>> > >>>>>>>> >>>>>>>> >>>>>> >>>>> >>>