Vlad, We are conflicted because you immediately want the project to fix the issue, while Dongjoon stated in the post that he does not want to block the release just because of this. We delayed the release of Apache Spark 4.0.0 a lot already (going to be month"s" now), and I do not want to see us enforcing a holistic solution immediately and blocking release due to this.
If you claim this now but open the timeline longer, beyond Spark 4.0.0 (like, setting timeline to Spark 4.1.0), I think there is no strong pushback about figuring out a long term fix. You cast your -1 vote in release (though non-binding), and so we are trying to address this even with the short term fix. What is wrong with this? Can you please open this to be a bit longer and do not block the release if you really want to see the long term fix rather than short term one? On Wed, Mar 26, 2025 at 8:48 AM Rozov, Vlad <vro...@amazon.com.invalid> wrote: > Please see inline. > > Thank you, > > Vlad > > On Mar 25, 2025, at 1:42 PM, Hyukjin Kwon <gurwls...@apache.org> wrote: > > > - the approach encourages keeping jars files in the Apache Spark repo > Yes, and removes it from source releases. I believe this is a minimized > change with AS-IS? > > Yes, it removes jars from the source release and satisfies the ASF release > policy (see item 3 in my e-mail). At the same time it makes source release > different from the Github including release tag and I don’t think that in > the long term this is the right approach. > > > > - it is hard to identify what tests are impacted by jars so they can be > properly fixed > We have a list of test jars, and I will add the CI to check this after > this PR. > > My question was regarding tests, not jars. > > > > - the solution relies on jar being present or not present on the > classpath. Tests may be skipped unintentionally. It is also very easy to > introduce new tests that do not skip if jar does not exist. Such test will > break only during release. > The tests themselves rely on how I check and skip the tests. Tests won't > pass on the other condition. In addition, we already have similar skipped > tests. > > For existing tests where you added condition, tests won’t pass, but may be > incorrectly skipped if there is bug in the condition. That will provide > wrong impression that test exists and passes where actually it is skipped > on the condition. New tests may be added later that do not have condition > and those will fail only during release. > > > > IMO, it is necessary to see if the source code for test jars is > available or can be reconstructed. If not, it is necessary to see how the > functionality still can be tested even if jar is not available. If the > source code is available, to keep the tests it is necessary to build jars > during tests or publish jars to maven and pull them as the test dependency. > I agree but this is orthogonal to my question? > > If you agree, why not to temporarily disable tests? > > > 1. From what I read, the actual concern from you I get is: "the solution > relies on jar being present or not present on the classpath...". > Maintaining test coverage is much more important than making the test code > slightly harder to read IMO. > I think Junteak explained it better at > https://github.com/apache/spark/pull/50378#pullrequestreview-2712679827. > > 2. I have 6 +1s in https://github.com/apache/spark/pull/50378. I will > merge this in 48 hours to resolve this issue. The community seems to agree > with this approach. > > I raised my concerns with the approach that relies on detecting jar at the > runtime and keeping UNLICENSED jar files in the Github. If the community > agrees with your approach, I disagree and commit. Note that I still have an > outstanding comment on your PR > https://github.com/apache/spark/pull/50378#discussion_r2012935532. > > > PS: I’m a bit disappointed that my email requesting a video call was > ignored. Sometimes, a quick video call can save a lot of time compared to > texting. > > Sorry, but I did not receive any email requesting video call. Where the > request was made? I am open to the video call assuming that summary will be > posted to the dev list. > > Note that I am disappointed that multiple requests to review my PRs were > ignored or left unaswered too. I also have an outstanding question on the > revert here > https://lists.apache.org/thread/o8047n1cp8nc0q8c2ndht82h28p8j9jq. > > > > On Wed, 26 Mar 2025 at 04:14, Rozov, Vlad <vro...@amazon.com.invalid> > wrote: > >> The policy [1] is quite clear and the fact that other projects do not >> include compiled jars (including test jars) into the source release >> confirms the rule: >> >> "Every ASF release MUST contain one or more source packages, which MUST >> be sufficient for a user to build and test the release provided they have >> access to the appropriate platform and tools. A source release SHOULD not >> contain compiled code.” >> >> In addition to that UNLICENSED artifacts are against ASF policy as well. >> >> At this point there are 3 ways to approach the issue: >> >> 1. Release as is with jars. >> 2. Remove jars and disable affected test. Enable individual tests once >> source code for those jars is provided. >> 3. Remove jars from the source release only and keep them in the GitHub >> repo. >> >> My vote is to proceed with 2 and I don’t see why it is not solving the >> issue in your opinion. At the end it is up to PMC members to decide and >> cast the vote. >> >> Thank you, >> >> Vlad >> >> [1] https://www.apache.org/legal/release-policy.html#artifacts >> >> >> On Mar 25, 2025, at 11:29 AM, Sean Owen <sro...@gmail.com> wrote: >> >> I personally think you are reading this too narrowly; the principle is, >> as given: >> "...MUST contain one or more source packages, which MUST be sufficient >> for a user to build and test the release..." >> "All releases are in the form of the source materials needed to make >> changes to the software being released." >> >> I don't think the status quo actually contravenes that. >> That said, everyone is in agreement to just clean this up. >> But I think your position isn't actually solving any problem that this >> principle is intended to prevent. >> >> On Tue, Mar 25, 2025 at 1:25 PM Rozov, Vlad <vro...@amazon.com.invalid> >> wrote: >> >>> I already casted my vote. To clarify, having compiled unlicensed jars in >>> the source release is strictly against ASF policy [1]. Between a tiny >>> chance that some tests and functionality will break and a small chance that >>> ASF will request pull out of a long awaited release due to the policy >>> violation, I’d rather choose to break those tests. >>> >>> Thank you, >>> >>> Vlad >>> >>> PS. In addition to Hive and Hadoop source releases that Dongjoon >>> checked, I checked Apache Flink and Beam and none of those releases >>> includes jars. >>> >>> [1] https://www.apache.org/legal/release-policy.html >>> >>> >>> On Mar 25, 2025, at 8:46 AM, Holden Karau <holden.ka...@gmail.com> >>> wrote: >>> >>> So I think if I understand folks concerns it’s that we’ve let it slide >>> in the past and at some point we’ve got to stop letting it slide because >>> there is some concern we might not be meeting the ASF guidance here. >>> >>> Personally I think given they’re test artifacts and how delayed Spark 4 >>> is we should not block the release but we can agree to block anything >>> beyond Spark 4 on this as a compromise. >>> >>> What do folks think? >>> >>> Twitter: https://twitter.com/holdenkarau >>> Fight Health Insurance: https://www.fighthealthinsurance.com/ >>> <https://www.fighthealthinsurance.com/?q=hk_email> >>> Books (Learning Spark, High Performance Spark, etc.): >>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>> Pronouns: she/her >>> >>> >>> On Tue, Mar 25, 2025 at 8:43 AM Reynold Xin <r...@databricks.com.invalid> >>> wrote: >>> >>>> While I'd love to resolve this issue, I still don't understand why we >>>> would block the release for this. >>>> >>>> >>>> >>>> On Tue, Mar 25, 2025 at 7:49 AM Rozov, Vlad <vro...@amazon.com.invalid> >>>> wrote: >>>> >>>>> The difference is in the way how tests are disabled. >>>>> >>>>> - the approach encourages keeping jars files in the Apache Spark repo >>>>> - it is hard to identify what tests are impacted by jars so they can >>>>> be properly fixed >>>>> - the solution relies on jar being present or not present on the >>>>> classpath. Tests may be skipped unintentionally. It is also very easy to >>>>> introduce new tests that do not skip if jar does not exist. Such test will >>>>> break only during release. >>>>> >>>>> IMO, it is necessary to see if the source code for test jars is >>>>> available or can be reconstructed. If not, it is necessary to see how the >>>>> functionality still can be tested even if jar is not available. If the >>>>> source code is available, to keep the tests it is necessary to build jars >>>>> during tests or publish jars to maven and pull them as the test >>>>> dependency. >>>>> >>>>> Thank you, >>>>> >>>>> Vlad >>>>> >>>>> On Mar 24, 2025, at 11:52 PM, Hyukjin Kwon <gurwls...@apache.org> >>>>> wrote: >>>>> >>>>> What's the difference between disabling tests for dev and release vs >>>>> only for release? >>>>> >>>>> On Tue, 25 Mar 2025 at 15:36, Rozov, Vlad <vro...@amazon.com.invalid> >>>>> wrote: >>>>> >>>>>> Overall I don’t buy the solution where tests are skipped based on the >>>>>> presence of a jar file. It looks too fragile to me. What if there is a >>>>>> bug >>>>>> that does not add jar to a classpath? The test would be skipped, but not >>>>>> because jar was deleted, but because classpath is incorrect. >>>>>> >>>>>> Thank you, >>>>>> >>>>>> Vlad >>>>>> >>>>>> On Mar 24, 2025, at 7:56 PM, Hyukjin Kwon <gurwls...@apache.org> >>>>>> wrote: >>>>>> >>>>>> Valid concern. Maybe we can mark tests ignored when those tests do >>>>>> not exist for now. So tagged commit will skip those tests. Dev commits >>>>>> will >>>>>> still test them. >>>>>> >>>>>> On Tue, 25 Mar 2025 at 11:47, Jungtaek Lim < >>>>>> kabhwan.opensou...@gmail.com> wrote: >>>>>> >>>>>>> Maybe we should also check that it is mandatory for source code >>>>>>> being distributed under release to be able to pass the test suites? If >>>>>>> this >>>>>>> is mandatory, we can't just modify the release script to simply remove >>>>>>> the >>>>>>> jars, because this will break the tests in source code distribution. >>>>>>> >>>>>>> Actually this is my understanding to make sure tests pass from >>>>>>> source code and could build the same artifacts we release from source >>>>>>> code, >>>>>>> but I might be wrong. >>>>>>> >>>>>>> On Tue, Mar 25, 2025 at 11:32 AM Hyukjin Kwon <gurwls...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>>> Made a PR first (https://github.com/apache/spark/pull/50378). >>>>>>>> >>>>>>>> BTW, I agree that we should have the source code along with the >>>>>>>> jars, and ideally the dev branch should not contain them as well. This >>>>>>>> is a >>>>>>>> technical depth. >>>>>>>> For this, I hope we can improve this incrementally. >>>>>>>> >>>>>>>> I will also take a look and see if we can reject jars >>>>>>>> automatically in PRs or CI. >>>>>>>> >>>>>>>> >>>>>>>> On Tue, 25 Mar 2025 at 11:15, Hyukjin Kwon <gurwls...@apache.org> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> So the issues are source releases ( >>>>>>>>> https://github.com/apache/spark/tags) containing those jars, >>>>>>>>> right? Can we add the removal of test jars at the part of the release >>>>>>>>> process. >>>>>>>>> >>>>>>>>> They aren't included in binary releases in any event so removal on >>>>>>>>> every source release should work. >>>>>>>>> >>>>>>>>> On Tue, 25 Mar 2025 at 10:51, Jungtaek Lim < >>>>>>>>> kabhwan.opensou...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Let's make this very clear - do we not have a source code to >>>>>>>>>> build a jar, or have no way to infer the source code being used for >>>>>>>>>> the >>>>>>>>>> jar? >>>>>>>>>> >>>>>>>>>> I understand the concern, but if this is a huge issue, why no one >>>>>>>>>> has looked into this and here we just debate whether the affected >>>>>>>>>> tests >>>>>>>>>> need to be dropped/disabled or not? Whenever we add some test >>>>>>>>>> resources >>>>>>>>>> like a golden file, we tend to leave the part of the code to build >>>>>>>>>> the >>>>>>>>>> golden file. Did we check and confirm these jars are not the case >>>>>>>>>> and we >>>>>>>>>> lost the source code to build? >>>>>>>>>> >>>>>>>>>> On Tue, Mar 25, 2025 at 9:35 AM Rozov, Vlad >>>>>>>>>> <vro...@amazon.com.invalid> wrote: >>>>>>>>>> >>>>>>>>>>> First of all I don’t think that conclusion on the >>>>>>>>>>> https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k is >>>>>>>>>>> correct. Jar files included into the source release are compiled >>>>>>>>>>> from the >>>>>>>>>>> code and replacing them with dat or jpeg files won’t work. >>>>>>>>>>> Including jar >>>>>>>>>>> files into the source release is against ASF policy and my -1 will >>>>>>>>>>> stay as >>>>>>>>>>> long as jars are included into the source release. As this issue >>>>>>>>>>> was raised >>>>>>>>>>> not for the first time and there was no action (actually more jars >>>>>>>>>>> were >>>>>>>>>>> added), IMO, the issue should now be handled as the release blocker. >>>>>>>>>>> >>>>>>>>>>> I don’t see anything in the proposal that suggests that fix >>>>>>>>>>> for SPARK-51318 is or should be blocked by umbrella JIRA. The >>>>>>>>>>> proposal was >>>>>>>>>>> to recover tests one by one. The PR that I have open will allow to >>>>>>>>>>> accomplish these tasks as all disabled tests refer to >>>>>>>>>>> SPARK-51318. >>>>>>>>>>> >>>>>>>>>>> I can only help with SPARK-51318 at this point. Somebody else >>>>>>>>>>> will have to look into keeping tests enabled as it requires source >>>>>>>>>>> code for >>>>>>>>>>> the test jars. >>>>>>>>>>> >>>>>>>>>>> Thank you, >>>>>>>>>>> >>>>>>>>>>> Vlad >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mar 24, 2025, at 4:55 PM, Hyukjin Kwon <gurwls...@apache.org> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> I still disagree with just disabling tests and removing the jars >>>>>>>>>>> without making sure that we will enable them back. >>>>>>>>>>> I want to EITHER make sure we have a plan and someone to drive, >>>>>>>>>>> and the tests will be enabled back, OR have a one fix that does all. >>>>>>>>>>> Otherwise, my -1 stands if we can't be sure of that. >>>>>>>>>>> >>>>>>>>>>> On Tue, 25 Mar 2025 at 08:51, Hyukjin Kwon <gurwls...@apache.org> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> From what I read in the last discussion in the legal thread ( >>>>>>>>>>>> https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k), >>>>>>>>>>>> we don't really need to rush and block the release. >>>>>>>>>>>> I don't think we should block the release, remove the CI, and >>>>>>>>>>>> just remove the jars. >>>>>>>>>>>> >>>>>>>>>>>> Rozov, the original proposal of this thread is 1. to first >>>>>>>>>>>> disable the tests, and 2. open an umbrella JIRA to enable >>>>>>>>>>>> individual tests. >>>>>>>>>>>> Since you're driving this, would you mind either making a >>>>>>>>>>>> proper fix in one go, or create an umbrella JIRA to drive this? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Mon, 24 Mar 2025 at 23:46, Rozov, Vlad >>>>>>>>>>>> <vro...@amazon.com.invalid> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Let’s open a formal vote on the subject. I have open WIP PR >>>>>>>>>>>>> https://github.com/apache/spark/pull/50231 that is currently >>>>>>>>>>>>> blocked by -1. >>>>>>>>>>>>> >>>>>>>>>>>>> Thank you, >>>>>>>>>>>>> >>>>>>>>>>>>> Vlad >>>>>>>>>>>>> >>>>>>>>>>>>> On Mar 24, 2025, at 7:05 AM, Wenchen Fan <cloud0...@gmail.com> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> It seems there’s no quick fix for this issue. Should we remove >>>>>>>>>>>>> these jars and disable the tests for now to comply with ASF >>>>>>>>>>>>> policy? While >>>>>>>>>>>>> this would temporarily reduce test coverage until we refactor the >>>>>>>>>>>>> tests to >>>>>>>>>>>>> avoid pre-compiled jars, we can encourage Spark vendors not to >>>>>>>>>>>>> cherry-pick >>>>>>>>>>>>> this test-disabling commit so they can help report any test >>>>>>>>>>>>> failures. That >>>>>>>>>>>>> said, since these tests are quite old and stable, failures are >>>>>>>>>>>>> unlikely. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Wenchen >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Mar 13, 2025 at 12:15 AM Rozov, Vlad >>>>>>>>>>>>> <vro...@amazon.com.invalid> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> There is a difference between technical debt and legal issue. >>>>>>>>>>>>>> ASF may request to pull out release that does not meet ASF >>>>>>>>>>>>>> policy (and >>>>>>>>>>>>>> having tests is not ASF policy). IMO, SPARK-51318 should be a >>>>>>>>>>>>>> blocker for >>>>>>>>>>>>>> the next release or handled like a blocker. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thank you, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Vlad >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mar 10, 2025, at 6:02 PM, Jungtaek Lim < >>>>>>>>>>>>>> kabhwan.opensou...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> +1 to Hyukjin. If the test is effective, we should definitely >>>>>>>>>>>>>> retain the effectiveness of the test, unless we end up with the >>>>>>>>>>>>>> conclusion >>>>>>>>>>>>>> that there is no way to do that. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Mar 11, 2025 at 9:29 AM Hyukjin Kwon < >>>>>>>>>>>>>> gurwls...@apache.org> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> If we should fix, let's make sure we don't just disable the >>>>>>>>>>>>>>> tests - we will create another set of technical debt. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, 27 Feb 2025 at 09:11, Rozov, Vlad >>>>>>>>>>>>>>> <vro...@amazon.com.invalid> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I’ll look into the JIRA. Please assign it to me. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thank you, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Vlad >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> > On Feb 26, 2025, at 11:33 PM, Yang Jie < >>>>>>>>>>>>>>>> yangji...@apache.org> wrote: >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > +1, Agree to remove the jar files from the Apache Spark >>>>>>>>>>>>>>>> repository and disable the affected tests. >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > For the current test scenarios that use jar files, I >>>>>>>>>>>>>>>> believe we can definitely find a more reasonable testing >>>>>>>>>>>>>>>> approach. >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > Thanks, >>>>>>>>>>>>>>>> > Jie Yang >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > On 2025/02/26 16:57:45 "Rozov, Vlad" wrote: >>>>>>>>>>>>>>>> >> +1 on fixing test jars, though the way how it is fixed >>>>>>>>>>>>>>>> needs to be discussed, IMO. In the short term removing jars >>>>>>>>>>>>>>>> may still be >>>>>>>>>>>>>>>> the best option to satisfy ASF legal policy and avoid release >>>>>>>>>>>>>>>> removal. >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> AFAIK, ASF mandates that users and developers have >>>>>>>>>>>>>>>> source code that they build from (source release), not that >>>>>>>>>>>>>>>> they run >>>>>>>>>>>>>>>> (binary release). >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> Thank you, >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> Vlad >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >>> On Feb 26, 2025, at 8:47 AM, Dongjoon Hyun < >>>>>>>>>>>>>>>> dongj...@apache.org> wrote: >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> >>> Thank you for your reply, Sean. >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> >>> I expected that argument exactly so that I started by >>>>>>>>>>>>>>>> quoting your sentence in the above. >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> >>> I understood the reasoning in 2018. However, there are >>>>>>>>>>>>>>>> two reasons why I brought this again in 2025: >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> >>> First, the open source sprit is technically and >>>>>>>>>>>>>>>> literally "no compiled code in a source release" like Apache >>>>>>>>>>>>>>>> Hadoop and >>>>>>>>>>>>>>>> Hive community does. Justin, Vlad, and Alex shared the same >>>>>>>>>>>>>>>> perspective to >>>>>>>>>>>>>>>> the Apache Spark PMC. >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> >>> $ tar tvf apache-hive-4.0.1-src.tar.gz | grep 'jar$' | >>>>>>>>>>>>>>>> wc -l >>>>>>>>>>>>>>>> >>> 0 >>>>>>>>>>>>>>>> >>> $ tar tvfz hadoop-3.4.1-src.tar.gz | grep 'jar$' | wc -l >>>>>>>>>>>>>>>> >>> 0 >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> >>> Second, last year, the open source communities were hit >>>>>>>>>>>>>>>> by CVE-2024-3094 ("XZ Utils Backdoor") in the world-wide >>>>>>>>>>>>>>>> manner where the >>>>>>>>>>>>>>>> backdoor was hidden in the test object. I believe most of us >>>>>>>>>>>>>>>> are aware of >>>>>>>>>>>>>>>> that. At that time, the GitHub repository was disabled. As a >>>>>>>>>>>>>>>> member of >>>>>>>>>>>>>>>> Apache Spark PMC, I'm suggesting to remove that risk from the >>>>>>>>>>>>>>>> Apache Spark >>>>>>>>>>>>>>>> repository in 2025. I attached the following link to provide >>>>>>>>>>>>>>>> the XZ Utils >>>>>>>>>>>>>>>> history explicitly. >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> https://www.akamai.com/blog/security-research/critical-linux-backdoor-xz-utils-discovered-what-to-know >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> >>> Although I agree that those test coverages are >>>>>>>>>>>>>>>> important, I don't think that's worthy for Apache Spark >>>>>>>>>>>>>>>> community to take a >>>>>>>>>>>>>>>> risk to be shutdown. That's the lesson which I've learned last >>>>>>>>>>>>>>>> year. >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> >>> Sincerely, >>>>>>>>>>>>>>>> >>> Dongjoon. >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> >>> On 2025/02/26 13:31:56 Sean Owen wrote: >>>>>>>>>>>>>>>> >>>> The gist of the initial 2018 thread was: >>>>>>>>>>>>>>>> >>>> These are not source .jar files that users use, but >>>>>>>>>>>>>>>> .jar files used to test >>>>>>>>>>>>>>>> >>>> loading of from .jar files. These are test resources >>>>>>>>>>>>>>>> only. >>>>>>>>>>>>>>>> >>>> I don't think this is what the spirit of the rule is >>>>>>>>>>>>>>>> speaking to, that the >>>>>>>>>>>>>>>> >>>> end-user code should always have source code, which is >>>>>>>>>>>>>>>> the right principle. >>>>>>>>>>>>>>>> >>>> Checking in the code somewhere is nice to have though >>>>>>>>>>>>>>>> and I think that was >>>>>>>>>>>>>>>> >>>> the idea here. >>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>> >>>> But, removing these and disabling potentially valuable >>>>>>>>>>>>>>>> tests seems like a >>>>>>>>>>>>>>>> >>>> step too far. There is no actual 'problem' w.r.t. the >>>>>>>>>>>>>>>> principle that users >>>>>>>>>>>>>>>> >>>> have source to the code they run. >>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>> >>>> The 2025 thread just retreads the same ground as the >>>>>>>>>>>>>>>> 2018 thread. >>>>>>>>>>>>>>>> >>>> But I don't see that we put this argument to the >>>>>>>>>>>>>>>> person who raised it >>>>>>>>>>>>>>>> >>>> again. Why not that first? >>>>>>>>>>>>>>>> >>>> And, if possible, go stick the source to these jars in >>>>>>>>>>>>>>>> the source tree, >>>>>>>>>>>>>>>> >>>> where available. >>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>> >>>> On Wed, Feb 26, 2025 at 1:08 AM Dongjoon Hyun < >>>>>>>>>>>>>>>> dongjoon.h...@gmail.com> >>>>>>>>>>>>>>>> >>>> wrote: >>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>> >>>>> Hi, All. >>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>> >>>>> Unfortunately, the Apache Spark project seems to have >>>>>>>>>>>>>>>> a technical debt in >>>>>>>>>>>>>>>> >>>>> the source code releases. It happens to be discussed >>>>>>>>>>>>>>>> at least twice on both >>>>>>>>>>>>>>>> >>>>> dev@spark and legal-discuss mailing lists. (Thank >>>>>>>>>>>>>>>> you for the head-up, >>>>>>>>>>>>>>>> >>>>> Vlad.) >>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>> >>>>> 1. >>>>>>>>>>>>>>>> https://lists.apache.org/thread/3sxw9gwp51mrkzlo2xchq1g20gbgbnz8 >>>>>>>>>>>>>>>> >>>>> (2018-06-21, dev@spark) >>>>>>>>>>>>>>>> >>>>> 2. >>>>>>>>>>>>>>>> https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k >>>>>>>>>>>>>>>> >>>>> (2018-06-25, legal-discuss@) >>>>>>>>>>>>>>>> >>>>> 3. >>>>>>>>>>>>>>>> https://lists.apache.org/thread/z3oq1db80vc8c7r6892hwjnq4h7hnwmd >>>>>>>>>>>>>>>> >>>>> (2025-02-25, dev@spark) >>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>> >>>>> To be short, according to the previous conclusion in >>>>>>>>>>>>>>>> 2018, the Apache >>>>>>>>>>>>>>>> >>>>> Spark community wanted to adhere to the ASF policy by >>>>>>>>>>>>>>>> removing those jar >>>>>>>>>>>>>>>> >>>>> files from source code releases (although it was not >>>>>>>>>>>>>>>> considered as a >>>>>>>>>>>>>>>> >>>>> release blocker at that time and until now). >>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>> >>>>>> it's important to be able to recreate these JARs >>>>>>>>>>>>>>>> somehow, >>>>>>>>>>>>>>>> >>>>>> and I don't think we have the source in the repo for >>>>>>>>>>>>>>>> all of them >>>>>>>>>>>>>>>> >>>>>> (at least, the ones that originate from Spark). >>>>>>>>>>>>>>>> >>>>>> That much seems like a must-do. After that, seems >>>>>>>>>>>>>>>> worth figuring out >>>>>>>>>>>>>>>> >>>>>> just how hard it is to build these artifacts from >>>>>>>>>>>>>>>> source. >>>>>>>>>>>>>>>> >>>>>> If it's easy, great. If not, either the test can be >>>>>>>>>>>>>>>> removed or >>>>>>>>>>>>>>>> >>>>>> we figure out just how hard a requirement this is. >>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>> >>>>> Given the unresolved issue for seven years, I >>>>>>>>>>>>>>>> proposed SPARK-51318 as a >>>>>>>>>>>>>>>> >>>>> potential solution to comply with ASF policy. After >>>>>>>>>>>>>>>> SPARK-51318, we can >>>>>>>>>>>>>>>> >>>>> recover the test coverage one by one later by >>>>>>>>>>>>>>>> addressing IDed TODO items >>>>>>>>>>>>>>>> >>>>> without any legal concerns during the votes. >>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>> >>>>> https://issues.apache.org/jira/browse/SPARK-51318 >>>>>>>>>>>>>>>> >>>>> (Remove `jar` files from Apache Spark repository and >>>>>>>>>>>>>>>> disable affected >>>>>>>>>>>>>>>> >>>>> tests) >>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>> >>>>> WDYT? >>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>> >>>>> BTW, please note that I didn't define SPARK-51318 as >>>>>>>>>>>>>>>> a blocker for any >>>>>>>>>>>>>>>> >>>>> on-going releases yet. >>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>> >>>>> Best regards, >>>>>>>>>>>>>>>> >>>>> Dongjoon. >>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>>>>>> >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>>>>>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>>>>>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>> >>>>> >>> >> >