I am confused. The consensus is made pretty clearly in https://github.com/apache/spark/pull/50378, CI passed. Now it has 9 +1s from all different groups. Why do we need to change the way? I don't think we should override the community consensus because you think the approach is hacky.
On Wed, 26 Mar 2025 at 11:40, Rozov, Vlad <vro...@amazon.com.invalid> wrote: > I think that there is some miscommunication/misunderstanding, so I’d like > to clarify my view on the issue. > > 1. I don’t think there is a conflict. I think that overall almost all > agree that having jar files in the Apache source release does not comply > with the Apache release policy and they need to be removed. > 2. The question is when and how to remove them. My initial assumption was > that jars would be removed as part of 4.1.0 and backported to 3.5.x. > 3. With the above assumption I voted -0 on 3.5.5 and open > https://github.com/apache/spark/pull/50231 WIP PR with the plan to still > vote -0 on 4.0 RC as long as jars are still part of the source release. > 4. HyukjinKwon@ blocked that PR with -1 ( > https://github.com/apache/spark/pull/50231#issuecomment-2714485887) > giving tests priority over ASF policy. > 5. There was no indication that he (or somebody else) would work on the > removing jars as part of 4.1.0 release as he casted -1 veto. > 6. That caused me to change my vote from -0 to -1 on 4.0 release as it > sounded that the issue would not be address not only in 4.0 and 3.5.5 but > also in 4.1.0 and 3.5.6. > 7. The solution proposed by HyukjinKwon@ looks like a hack to me. > > To move forward, let’s mark SPARK-51318 as blocking for 4.1.0 and 3.5.6, > remove -1 on https://github.com/apache/spark/pull/50231, agree that > skipped tests would be fixed in the follow up PRs (4.1.x). Does that sound > like a good plan to you? > > Thank you, > > Vlad > > On Mar 25, 2025, at 5:17 PM, Jungtaek Lim <kabhwan.opensou...@gmail.com> > wrote: > > Vlad, > > We are conflicted because you immediately want the project to fix the > issue, while Dongjoon stated in the post that he does not want to block the > release just because of this. We delayed the release of Apache Spark 4.0.0 > a lot already (going to be month"s" now), and I do not want to see us > enforcing a holistic solution immediately and blocking release due to this. > > If you claim this now but open the timeline longer, beyond Spark 4.0.0 > (like, setting timeline to Spark 4.1.0), I think there is no strong > pushback about figuring out a long term fix. You cast your -1 vote in > release (though non-binding), and so we are trying to address this even > with the short term fix. What is wrong with this? Can you please open this > to be a bit longer and do not block the release if you really want to see > the long term fix rather than short term one? > > On Wed, Mar 26, 2025 at 8:48 AM Rozov, Vlad <vro...@amazon.com.invalid> > wrote: > >> Please see inline. >> >> Thank you, >> >> Vlad >> >> On Mar 25, 2025, at 1:42 PM, Hyukjin Kwon <gurwls...@apache.org> wrote: >> >> > - the approach encourages keeping jars files in the Apache Spark repo >> Yes, and removes it from source releases. I believe this is a minimized >> change with AS-IS? >> >> Yes, it removes jars from the source release and satisfies the ASF >> release policy (see item 3 in my e-mail). At the same time it makes source >> release different from the Github including release tag and I don’t think >> that in the long term this is the right approach. >> >> >> > - it is hard to identify what tests are impacted by jars so they can be >> properly fixed >> We have a list of test jars, and I will add the CI to check this after >> this PR. >> >> My question was regarding tests, not jars. >> >> >> > - the solution relies on jar being present or not present on the >> classpath. Tests may be skipped unintentionally. It is also very easy to >> introduce new tests that do not skip if jar does not exist. Such test will >> break only during release. >> The tests themselves rely on how I check and skip the tests. Tests won't >> pass on the other condition. In addition, we already have similar skipped >> tests. >> >> For existing tests where you added condition, tests won’t pass, but may >> be incorrectly skipped if there is bug in the condition. That will provide >> wrong impression that test exists and passes where actually it is skipped >> on the condition. New tests may be added later that do not have condition >> and those will fail only during release. >> >> >> > IMO, it is necessary to see if the source code for test jars is >> available or can be reconstructed. If not, it is necessary to see how the >> functionality still can be tested even if jar is not available. If the >> source code is available, to keep the tests it is necessary to build jars >> during tests or publish jars to maven and pull them as the test dependency. >> I agree but this is orthogonal to my question? >> >> If you agree, why not to temporarily disable tests? >> >> >> 1. From what I read, the actual concern from you I get is: "the solution >> relies on jar being present or not present on the classpath...". >> Maintaining test coverage is much more important than making the test >> code slightly harder to read IMO. >> I think Junteak explained it better at >> https://github.com/apache/spark/pull/50378#pullrequestreview-2712679827. >> >> 2. I have 6 +1s in https://github.com/apache/spark/pull/50378. I will >> merge this in 48 hours to resolve this issue. The community seems to agree >> with this approach. >> >> I raised my concerns with the approach that relies on detecting jar at >> the runtime and keeping UNLICENSED jar files in the Github. If the >> community agrees with your approach, I disagree and commit. Note that I >> still have an outstanding comment on your PR >> https://github.com/apache/spark/pull/50378#discussion_r2012935532. >> >> >> PS: I’m a bit disappointed that my email requesting a video call was >> ignored. Sometimes, a quick video call can save a lot of time compared to >> texting. >> >> Sorry, but I did not receive any email requesting video call. Where the >> request was made? I am open to the video call assuming that summary will be >> posted to the dev list. >> >> Note that I am disappointed that multiple requests to review my PRs were >> ignored or left unaswered too. I also have an outstanding question on the >> revert here >> https://lists.apache.org/thread/o8047n1cp8nc0q8c2ndht82h28p8j9jq. >> >> >> >> On Wed, 26 Mar 2025 at 04:14, Rozov, Vlad <vro...@amazon.com.invalid> >> wrote: >> >>> The policy [1] is quite clear and the fact that other projects do not >>> include compiled jars (including test jars) into the source release >>> confirms the rule: >>> >>> "Every ASF release MUST contain one or more source packages, which MUST >>> be sufficient for a user to build and test the release provided they have >>> access to the appropriate platform and tools. A source release SHOULD not >>> contain compiled code.” >>> >>> In addition to that UNLICENSED artifacts are against ASF policy as well. >>> >>> At this point there are 3 ways to approach the issue: >>> >>> 1. Release as is with jars. >>> 2. Remove jars and disable affected test. Enable individual tests once >>> source code for those jars is provided. >>> 3. Remove jars from the source release only and keep them in the GitHub >>> repo. >>> >>> My vote is to proceed with 2 and I don’t see why it is not solving the >>> issue in your opinion. At the end it is up to PMC members to decide and >>> cast the vote. >>> >>> Thank you, >>> >>> Vlad >>> >>> [1] https://www.apache.org/legal/release-policy.html#artifacts >>> >>> >>> On Mar 25, 2025, at 11:29 AM, Sean Owen <sro...@gmail.com> wrote: >>> >>> I personally think you are reading this too narrowly; the principle is, >>> as given: >>> "...MUST contain one or more source packages, which MUST be sufficient >>> for a user to build and test the release..." >>> "All releases are in the form of the source materials needed to make >>> changes to the software being released." >>> >>> I don't think the status quo actually contravenes that. >>> That said, everyone is in agreement to just clean this up. >>> But I think your position isn't actually solving any problem that this >>> principle is intended to prevent. >>> >>> On Tue, Mar 25, 2025 at 1:25 PM Rozov, Vlad <vro...@amazon.com.invalid> >>> wrote: >>> >>>> I already casted my vote. To clarify, having compiled unlicensed jars >>>> in the source release is strictly against ASF policy [1]. Between a tiny >>>> chance that some tests and functionality will break and a small chance that >>>> ASF will request pull out of a long awaited release due to the policy >>>> violation, I’d rather choose to break those tests. >>>> >>>> Thank you, >>>> >>>> Vlad >>>> >>>> PS. In addition to Hive and Hadoop source releases that Dongjoon >>>> checked, I checked Apache Flink and Beam and none of those releases >>>> includes jars. >>>> >>>> [1] https://www.apache.org/legal/release-policy.html >>>> >>>> >>>> On Mar 25, 2025, at 8:46 AM, Holden Karau <holden.ka...@gmail.com> >>>> wrote: >>>> >>>> So I think if I understand folks concerns it’s that we’ve let it slide >>>> in the past and at some point we’ve got to stop letting it slide because >>>> there is some concern we might not be meeting the ASF guidance here. >>>> >>>> Personally I think given they’re test artifacts and how delayed Spark 4 >>>> is we should not block the release but we can agree to block anything >>>> beyond Spark 4 on this as a compromise. >>>> >>>> What do folks think? >>>> >>>> Twitter: https://twitter.com/holdenkarau >>>> Fight Health Insurance: https://www.fighthealthinsurance.com/ >>>> <https://www.fighthealthinsurance.com/?q=hk_email> >>>> Books (Learning Spark, High Performance Spark, etc.): >>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>> Pronouns: she/her >>>> >>>> >>>> On Tue, Mar 25, 2025 at 8:43 AM Reynold Xin <r...@databricks.com.invalid> >>>> wrote: >>>> >>>>> While I'd love to resolve this issue, I still don't understand why we >>>>> would block the release for this. >>>>> >>>>> >>>>> >>>>> On Tue, Mar 25, 2025 at 7:49 AM Rozov, Vlad <vro...@amazon.com.invalid> >>>>> wrote: >>>>> >>>>>> The difference is in the way how tests are disabled. >>>>>> >>>>>> - the approach encourages keeping jars files in the Apache Spark repo >>>>>> - it is hard to identify what tests are impacted by jars so they can >>>>>> be properly fixed >>>>>> - the solution relies on jar being present or not present on the >>>>>> classpath. Tests may be skipped unintentionally. It is also very easy to >>>>>> introduce new tests that do not skip if jar does not exist. Such test >>>>>> will >>>>>> break only during release. >>>>>> >>>>>> IMO, it is necessary to see if the source code for test jars is >>>>>> available or can be reconstructed. If not, it is necessary to see how the >>>>>> functionality still can be tested even if jar is not available. If the >>>>>> source code is available, to keep the tests it is necessary to build jars >>>>>> during tests or publish jars to maven and pull them as the test >>>>>> dependency. >>>>>> >>>>>> Thank you, >>>>>> >>>>>> Vlad >>>>>> >>>>>> On Mar 24, 2025, at 11:52 PM, Hyukjin Kwon <gurwls...@apache.org> >>>>>> wrote: >>>>>> >>>>>> What's the difference between disabling tests for dev and release vs >>>>>> only for release? >>>>>> >>>>>> On Tue, 25 Mar 2025 at 15:36, Rozov, Vlad <vro...@amazon.com.invalid> >>>>>> wrote: >>>>>> >>>>>>> Overall I don’t buy the solution where tests are skipped based on >>>>>>> the presence of a jar file. It looks too fragile to me. What if there >>>>>>> is a >>>>>>> bug that does not add jar to a classpath? The test would be skipped, but >>>>>>> not because jar was deleted, but because classpath is incorrect. >>>>>>> >>>>>>> Thank you, >>>>>>> >>>>>>> Vlad >>>>>>> >>>>>>> On Mar 24, 2025, at 7:56 PM, Hyukjin Kwon <gurwls...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>> Valid concern. Maybe we can mark tests ignored when those tests do >>>>>>> not exist for now. So tagged commit will skip those tests. Dev commits >>>>>>> will >>>>>>> still test them. >>>>>>> >>>>>>> On Tue, 25 Mar 2025 at 11:47, Jungtaek Lim < >>>>>>> kabhwan.opensou...@gmail.com> wrote: >>>>>>> >>>>>>>> Maybe we should also check that it is mandatory for source code >>>>>>>> being distributed under release to be able to pass the test suites? If >>>>>>>> this >>>>>>>> is mandatory, we can't just modify the release script to simply remove >>>>>>>> the >>>>>>>> jars, because this will break the tests in source code distribution. >>>>>>>> >>>>>>>> Actually this is my understanding to make sure tests pass from >>>>>>>> source code and could build the same artifacts we release from source >>>>>>>> code, >>>>>>>> but I might be wrong. >>>>>>>> >>>>>>>> On Tue, Mar 25, 2025 at 11:32 AM Hyukjin Kwon <gurwls...@apache.org> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Made a PR first (https://github.com/apache/spark/pull/50378). >>>>>>>>> >>>>>>>>> BTW, I agree that we should have the source code along with the >>>>>>>>> jars, and ideally the dev branch should not contain them as well. >>>>>>>>> This is a >>>>>>>>> technical depth. >>>>>>>>> For this, I hope we can improve this incrementally. >>>>>>>>> >>>>>>>>> I will also take a look and see if we can reject jars >>>>>>>>> automatically in PRs or CI. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, 25 Mar 2025 at 11:15, Hyukjin Kwon <gurwls...@apache.org> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> So the issues are source releases ( >>>>>>>>>> https://github.com/apache/spark/tags) containing those jars, >>>>>>>>>> right? Can we add the removal of test jars at the part of the release >>>>>>>>>> process. >>>>>>>>>> >>>>>>>>>> They aren't included in binary releases in any event so removal >>>>>>>>>> on every source release should work. >>>>>>>>>> >>>>>>>>>> On Tue, 25 Mar 2025 at 10:51, Jungtaek Lim < >>>>>>>>>> kabhwan.opensou...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Let's make this very clear - do we not have a source code to >>>>>>>>>>> build a jar, or have no way to infer the source code being used for >>>>>>>>>>> the >>>>>>>>>>> jar? >>>>>>>>>>> >>>>>>>>>>> I understand the concern, but if this is a huge issue, why no >>>>>>>>>>> one has looked into this and here we just debate whether the >>>>>>>>>>> affected tests >>>>>>>>>>> need to be dropped/disabled or not? Whenever we add some test >>>>>>>>>>> resources >>>>>>>>>>> like a golden file, we tend to leave the part of the code to build >>>>>>>>>>> the >>>>>>>>>>> golden file. Did we check and confirm these jars are not the case >>>>>>>>>>> and we >>>>>>>>>>> lost the source code to build? >>>>>>>>>>> >>>>>>>>>>> On Tue, Mar 25, 2025 at 9:35 AM Rozov, Vlad >>>>>>>>>>> <vro...@amazon.com.invalid> wrote: >>>>>>>>>>> >>>>>>>>>>>> First of all I don’t think that conclusion on the >>>>>>>>>>>> https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k is >>>>>>>>>>>> correct. Jar files included into the source release are compiled >>>>>>>>>>>> from the >>>>>>>>>>>> code and replacing them with dat or jpeg files won’t work. >>>>>>>>>>>> Including jar >>>>>>>>>>>> files into the source release is against ASF policy and my -1 will >>>>>>>>>>>> stay as >>>>>>>>>>>> long as jars are included into the source release. As this issue >>>>>>>>>>>> was raised >>>>>>>>>>>> not for the first time and there was no action (actually more jars >>>>>>>>>>>> were >>>>>>>>>>>> added), IMO, the issue should now be handled as the release >>>>>>>>>>>> blocker. >>>>>>>>>>>> >>>>>>>>>>>> I don’t see anything in the proposal that suggests that fix >>>>>>>>>>>> for SPARK-51318 is or should be blocked by umbrella JIRA. The >>>>>>>>>>>> proposal was >>>>>>>>>>>> to recover tests one by one. The PR that I have open will allow to >>>>>>>>>>>> accomplish these tasks as all disabled tests refer to >>>>>>>>>>>> SPARK-51318. >>>>>>>>>>>> >>>>>>>>>>>> I can only help with SPARK-51318 at this point. Somebody else >>>>>>>>>>>> will have to look into keeping tests enabled as it requires source >>>>>>>>>>>> code for >>>>>>>>>>>> the test jars. >>>>>>>>>>>> >>>>>>>>>>>> Thank you, >>>>>>>>>>>> >>>>>>>>>>>> Vlad >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Mar 24, 2025, at 4:55 PM, Hyukjin Kwon <gurwls...@apache.org> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> I still disagree with just disabling tests and removing the >>>>>>>>>>>> jars without making sure that we will enable them back. >>>>>>>>>>>> I want to EITHER make sure we have a plan and someone to drive, >>>>>>>>>>>> and the tests will be enabled back, OR have a one fix that does >>>>>>>>>>>> all. >>>>>>>>>>>> Otherwise, my -1 stands if we can't be sure of that. >>>>>>>>>>>> >>>>>>>>>>>> On Tue, 25 Mar 2025 at 08:51, Hyukjin Kwon < >>>>>>>>>>>> gurwls...@apache.org> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> From what I read in the last discussion in the legal thread ( >>>>>>>>>>>>> https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k), >>>>>>>>>>>>> we don't really need to rush and block the release. >>>>>>>>>>>>> I don't think we should block the release, remove the CI, and >>>>>>>>>>>>> just remove the jars. >>>>>>>>>>>>> >>>>>>>>>>>>> Rozov, the original proposal of this thread is 1. to first >>>>>>>>>>>>> disable the tests, and 2. open an umbrella JIRA to enable >>>>>>>>>>>>> individual tests. >>>>>>>>>>>>> Since you're driving this, would you mind either making a >>>>>>>>>>>>> proper fix in one go, or create an umbrella JIRA to drive this? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, 24 Mar 2025 at 23:46, Rozov, Vlad >>>>>>>>>>>>> <vro...@amazon.com.invalid> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Let’s open a formal vote on the subject. I have open WIP PR >>>>>>>>>>>>>> https://github.com/apache/spark/pull/50231 that is currently >>>>>>>>>>>>>> blocked by -1. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thank you, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Vlad >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mar 24, 2025, at 7:05 AM, Wenchen Fan <cloud0...@gmail.com> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> It seems there’s no quick fix for this issue. Should we >>>>>>>>>>>>>> remove these jars and disable the tests for now to comply with >>>>>>>>>>>>>> ASF policy? >>>>>>>>>>>>>> While this would temporarily reduce test coverage until we >>>>>>>>>>>>>> refactor the >>>>>>>>>>>>>> tests to avoid pre-compiled jars, we can encourage Spark vendors >>>>>>>>>>>>>> not to >>>>>>>>>>>>>> cherry-pick this test-disabling commit so they can help report >>>>>>>>>>>>>> any test >>>>>>>>>>>>>> failures. That said, since these tests are quite old and stable, >>>>>>>>>>>>>> failures >>>>>>>>>>>>>> are unlikely. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Wenchen >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Mar 13, 2025 at 12:15 AM Rozov, Vlad >>>>>>>>>>>>>> <vro...@amazon.com.invalid> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> There is a difference between technical debt and legal >>>>>>>>>>>>>>> issue. ASF may request to pull out release that does not meet >>>>>>>>>>>>>>> ASF policy >>>>>>>>>>>>>>> (and having tests is not ASF policy). IMO, SPARK-51318 should >>>>>>>>>>>>>>> be a blocker >>>>>>>>>>>>>>> for the next release or handled like a blocker. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thank you, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Vlad >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Mar 10, 2025, at 6:02 PM, Jungtaek Lim < >>>>>>>>>>>>>>> kabhwan.opensou...@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> +1 to Hyukjin. If the test is effective, we should >>>>>>>>>>>>>>> definitely retain the effectiveness of the test, unless we end >>>>>>>>>>>>>>> up with the >>>>>>>>>>>>>>> conclusion that there is no way to do that. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Mar 11, 2025 at 9:29 AM Hyukjin Kwon < >>>>>>>>>>>>>>> gurwls...@apache.org> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> If we should fix, let's make sure we don't just disable the >>>>>>>>>>>>>>>> tests - we will create another set of technical debt. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, 27 Feb 2025 at 09:11, Rozov, Vlad >>>>>>>>>>>>>>>> <vro...@amazon.com.invalid> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I’ll look into the JIRA. Please assign it to me. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thank you, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Vlad >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> > On Feb 26, 2025, at 11:33 PM, Yang Jie < >>>>>>>>>>>>>>>>> yangji...@apache.org> wrote: >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > +1, Agree to remove the jar files from the Apache Spark >>>>>>>>>>>>>>>>> repository and disable the affected tests. >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > For the current test scenarios that use jar files, I >>>>>>>>>>>>>>>>> believe we can definitely find a more reasonable testing >>>>>>>>>>>>>>>>> approach. >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > Thanks, >>>>>>>>>>>>>>>>> > Jie Yang >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > On 2025/02/26 16:57:45 "Rozov, Vlad" wrote: >>>>>>>>>>>>>>>>> >> +1 on fixing test jars, though the way how it is fixed >>>>>>>>>>>>>>>>> needs to be discussed, IMO. In the short term removing jars >>>>>>>>>>>>>>>>> may still be >>>>>>>>>>>>>>>>> the best option to satisfy ASF legal policy and avoid release >>>>>>>>>>>>>>>>> removal. >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> AFAIK, ASF mandates that users and developers have >>>>>>>>>>>>>>>>> source code that they build from (source release), not that >>>>>>>>>>>>>>>>> they run >>>>>>>>>>>>>>>>> (binary release). >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> Thank you, >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> Vlad >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >>> On Feb 26, 2025, at 8:47 AM, Dongjoon Hyun < >>>>>>>>>>>>>>>>> dongj...@apache.org> wrote: >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> >>> Thank you for your reply, Sean. >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> >>> I expected that argument exactly so that I started by >>>>>>>>>>>>>>>>> quoting your sentence in the above. >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> >>> I understood the reasoning in 2018. However, there are >>>>>>>>>>>>>>>>> two reasons why I brought this again in 2025: >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> >>> First, the open source sprit is technically and >>>>>>>>>>>>>>>>> literally "no compiled code in a source release" like Apache >>>>>>>>>>>>>>>>> Hadoop and >>>>>>>>>>>>>>>>> Hive community does. Justin, Vlad, and Alex shared the same >>>>>>>>>>>>>>>>> perspective to >>>>>>>>>>>>>>>>> the Apache Spark PMC. >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> >>> $ tar tvf apache-hive-4.0.1-src.tar.gz | grep 'jar$' | >>>>>>>>>>>>>>>>> wc -l >>>>>>>>>>>>>>>>> >>> 0 >>>>>>>>>>>>>>>>> >>> $ tar tvfz hadoop-3.4.1-src.tar.gz | grep 'jar$' | wc >>>>>>>>>>>>>>>>> -l >>>>>>>>>>>>>>>>> >>> 0 >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> >>> Second, last year, the open source communities were >>>>>>>>>>>>>>>>> hit by CVE-2024-3094 ("XZ Utils Backdoor") in the world-wide >>>>>>>>>>>>>>>>> manner where >>>>>>>>>>>>>>>>> the backdoor was hidden in the test object. I believe most of >>>>>>>>>>>>>>>>> us are aware >>>>>>>>>>>>>>>>> of that. At that time, the GitHub repository was disabled. As >>>>>>>>>>>>>>>>> a member of >>>>>>>>>>>>>>>>> Apache Spark PMC, I'm suggesting to remove that risk from the >>>>>>>>>>>>>>>>> Apache Spark >>>>>>>>>>>>>>>>> repository in 2025. I attached the following link to provide >>>>>>>>>>>>>>>>> the XZ Utils >>>>>>>>>>>>>>>>> history explicitly. >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> https://www.akamai.com/blog/security-research/critical-linux-backdoor-xz-utils-discovered-what-to-know >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> >>> Although I agree that those test coverages are >>>>>>>>>>>>>>>>> important, I don't think that's worthy for Apache Spark >>>>>>>>>>>>>>>>> community to take a >>>>>>>>>>>>>>>>> risk to be shutdown. That's the lesson which I've learned >>>>>>>>>>>>>>>>> last year. >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> >>> Sincerely, >>>>>>>>>>>>>>>>> >>> Dongjoon. >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> >>> On 2025/02/26 13:31:56 Sean Owen wrote: >>>>>>>>>>>>>>>>> >>>> The gist of the initial 2018 thread was: >>>>>>>>>>>>>>>>> >>>> These are not source .jar files that users use, but >>>>>>>>>>>>>>>>> .jar files used to test >>>>>>>>>>>>>>>>> >>>> loading of from .jar files. These are test resources >>>>>>>>>>>>>>>>> only. >>>>>>>>>>>>>>>>> >>>> I don't think this is what the spirit of the rule is >>>>>>>>>>>>>>>>> speaking to, that the >>>>>>>>>>>>>>>>> >>>> end-user code should always have source code, which >>>>>>>>>>>>>>>>> is the right principle. >>>>>>>>>>>>>>>>> >>>> Checking in the code somewhere is nice to have though >>>>>>>>>>>>>>>>> and I think that was >>>>>>>>>>>>>>>>> >>>> the idea here. >>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>> >>>> But, removing these and disabling potentially >>>>>>>>>>>>>>>>> valuable tests seems like a >>>>>>>>>>>>>>>>> >>>> step too far. There is no actual 'problem' w.r.t. the >>>>>>>>>>>>>>>>> principle that users >>>>>>>>>>>>>>>>> >>>> have source to the code they run. >>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>> >>>> The 2025 thread just retreads the same ground as the >>>>>>>>>>>>>>>>> 2018 thread. >>>>>>>>>>>>>>>>> >>>> But I don't see that we put this argument to the >>>>>>>>>>>>>>>>> person who raised it >>>>>>>>>>>>>>>>> >>>> again. Why not that first? >>>>>>>>>>>>>>>>> >>>> And, if possible, go stick the source to these jars >>>>>>>>>>>>>>>>> in the source tree, >>>>>>>>>>>>>>>>> >>>> where available. >>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>> >>>> On Wed, Feb 26, 2025 at 1:08 AM Dongjoon Hyun < >>>>>>>>>>>>>>>>> dongjoon.h...@gmail.com> >>>>>>>>>>>>>>>>> >>>> wrote: >>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>> >>>>> Hi, All. >>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>> >>>>> Unfortunately, the Apache Spark project seems to >>>>>>>>>>>>>>>>> have a technical debt in >>>>>>>>>>>>>>>>> >>>>> the source code releases. It happens to be discussed >>>>>>>>>>>>>>>>> at least twice on both >>>>>>>>>>>>>>>>> >>>>> dev@spark and legal-discuss mailing lists. (Thank >>>>>>>>>>>>>>>>> you for the head-up, >>>>>>>>>>>>>>>>> >>>>> Vlad.) >>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>> >>>>> 1. >>>>>>>>>>>>>>>>> https://lists.apache.org/thread/3sxw9gwp51mrkzlo2xchq1g20gbgbnz8 >>>>>>>>>>>>>>>>> >>>>> (2018-06-21, dev@spark) >>>>>>>>>>>>>>>>> >>>>> 2. >>>>>>>>>>>>>>>>> https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k >>>>>>>>>>>>>>>>> >>>>> (2018-06-25, legal-discuss@) >>>>>>>>>>>>>>>>> >>>>> 3. >>>>>>>>>>>>>>>>> https://lists.apache.org/thread/z3oq1db80vc8c7r6892hwjnq4h7hnwmd >>>>>>>>>>>>>>>>> >>>>> (2025-02-25, dev@spark) >>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>> >>>>> To be short, according to the previous conclusion in >>>>>>>>>>>>>>>>> 2018, the Apache >>>>>>>>>>>>>>>>> >>>>> Spark community wanted to adhere to the ASF policy >>>>>>>>>>>>>>>>> by removing those jar >>>>>>>>>>>>>>>>> >>>>> files from source code releases (although it was not >>>>>>>>>>>>>>>>> considered as a >>>>>>>>>>>>>>>>> >>>>> release blocker at that time and until now). >>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>> >>>>>> it's important to be able to recreate these JARs >>>>>>>>>>>>>>>>> somehow, >>>>>>>>>>>>>>>>> >>>>>> and I don't think we have the source in the repo >>>>>>>>>>>>>>>>> for all of them >>>>>>>>>>>>>>>>> >>>>>> (at least, the ones that originate from Spark). >>>>>>>>>>>>>>>>> >>>>>> That much seems like a must-do. After that, seems >>>>>>>>>>>>>>>>> worth figuring out >>>>>>>>>>>>>>>>> >>>>>> just how hard it is to build these artifacts from >>>>>>>>>>>>>>>>> source. >>>>>>>>>>>>>>>>> >>>>>> If it's easy, great. If not, either the test can be >>>>>>>>>>>>>>>>> removed or >>>>>>>>>>>>>>>>> >>>>>> we figure out just how hard a requirement this is. >>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>> >>>>> Given the unresolved issue for seven years, I >>>>>>>>>>>>>>>>> proposed SPARK-51318 as a >>>>>>>>>>>>>>>>> >>>>> potential solution to comply with ASF policy. After >>>>>>>>>>>>>>>>> SPARK-51318, we can >>>>>>>>>>>>>>>>> >>>>> recover the test coverage one by one later by >>>>>>>>>>>>>>>>> addressing IDed TODO items >>>>>>>>>>>>>>>>> >>>>> without any legal concerns during the votes. >>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>> >>>>> https://issues.apache.org/jira/browse/SPARK-51318 >>>>>>>>>>>>>>>>> >>>>> (Remove `jar` files from Apache Spark repository and >>>>>>>>>>>>>>>>> disable affected >>>>>>>>>>>>>>>>> >>>>> tests) >>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>> >>>>> WDYT? >>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>> >>>>> BTW, please note that I didn't define SPARK-51318 as >>>>>>>>>>>>>>>>> a blocker for any >>>>>>>>>>>>>>>>> >>>>> on-going releases yet. >>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>> >>>>> Best regards, >>>>>>>>>>>>>>>>> >>>>> Dongjoon. >>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>>>>>>> >>> To unsubscribe e-mail: >>>>>>>>>>>>>>>>> dev-unsubscr...@spark.apache.org >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>>>>>>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>>>>>>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>> >>>>>> >>>> >>> >> >