I am confused. The consensus is made pretty clearly in
https://github.com/apache/spark/pull/50378, CI passed. Now it has 9 +1s
from all different groups.
Why do we need to change the way? I don't think we should override the
community consensus because you think the approach is hacky.

On Wed, 26 Mar 2025 at 11:40, Rozov, Vlad <vro...@amazon.com.invalid> wrote:

> I think that there is some miscommunication/misunderstanding, so I’d like
> to clarify my view on the issue.
>
> 1. I don’t think there is a conflict. I think that overall almost all
> agree that having jar files in the Apache source release does not comply
> with the Apache release policy and they need to be removed.
> 2. The question is when and how to remove them. My initial assumption was
> that jars would be removed as part of 4.1.0 and backported to 3.5.x.
> 3. With the above assumption I voted -0 on 3.5.5 and open
> https://github.com/apache/spark/pull/50231 WIP PR with the plan to still
> vote -0 on 4.0 RC as long as jars are still part of the source release.
> 4. HyukjinKwon@ blocked that PR with -1 (
> https://github.com/apache/spark/pull/50231#issuecomment-2714485887)
> giving tests priority over ASF policy.
> 5. There was no indication that he (or somebody else) would work on the
> removing jars as part of 4.1.0 release as he casted -1 veto.
> 6. That caused me to change my vote from -0 to -1 on 4.0 release as it
> sounded that the issue would not be address not only in 4.0 and 3.5.5 but
> also in 4.1.0 and 3.5.6.
> 7. The solution proposed by HyukjinKwon@ looks like a hack to me.
>
> To move forward, let’s mark SPARK-51318 as blocking for 4.1.0 and 3.5.6,
> remove -1 on https://github.com/apache/spark/pull/50231, agree that
> skipped tests would be fixed in the follow up PRs (4.1.x). Does that sound
> like a good plan to you?
>
> Thank you,
>
> Vlad
>
> On Mar 25, 2025, at 5:17 PM, Jungtaek Lim <kabhwan.opensou...@gmail.com>
> wrote:
>
> Vlad,
>
> We are conflicted because you immediately want the project to fix the
> issue, while Dongjoon stated in the post that he does not want to block the
> release just because of this. We delayed the release of Apache Spark 4.0.0
> a lot already (going to be month"s" now), and I do not want to see us
> enforcing a holistic solution immediately and blocking release due to this.
>
> If you claim this now but open the timeline longer, beyond Spark 4.0.0
> (like, setting timeline to Spark 4.1.0), I think there is no strong
> pushback about figuring out a long term fix. You cast your -1 vote in
> release (though non-binding), and so we are trying to address this even
> with the short term fix. What is wrong with this? Can you please open this
> to be a bit longer and do not block the release if you really want to see
> the long term fix rather than short term one?
>
> On Wed, Mar 26, 2025 at 8:48 AM Rozov, Vlad <vro...@amazon.com.invalid>
> wrote:
>
>> Please see inline.
>>
>> Thank you,
>>
>> Vlad
>>
>> On Mar 25, 2025, at 1:42 PM, Hyukjin Kwon <gurwls...@apache.org> wrote:
>>
>> > - the approach encourages keeping jars files in the Apache Spark repo
>> Yes, and removes it from source releases. I believe this is a minimized
>> change with AS-IS?
>>
>> Yes, it removes jars from the source release and satisfies the ASF
>> release policy (see item 3 in my e-mail). At the same time it makes source
>> release different from the Github including release tag and I don’t think
>> that in the long term this is the right approach.
>>
>>
>> > - it is hard to identify what tests are impacted by jars so they can be
>> properly fixed
>> We have a list of test jars, and I will add the CI to check this after
>> this PR.
>>
>> My question was regarding tests, not jars.
>>
>>
>> > - the solution relies on jar being present or not present on the
>> classpath. Tests may be skipped unintentionally. It is also very easy to
>> introduce new tests that do not skip if jar does not exist. Such test will
>> break only during release.
>> The tests themselves rely on how I check and skip the tests. Tests won't
>> pass on the other condition. In addition, we already have similar skipped
>> tests.
>>
>> For existing tests where you added condition, tests won’t pass, but may
>> be incorrectly skipped if there is bug in the condition. That will provide
>> wrong impression that test exists and passes where actually it is skipped
>> on the condition. New tests may be added later that do not have condition
>> and those will fail only during release.
>>
>>
>> > IMO, it is necessary to see if the source code for test jars is
>> available or can be reconstructed. If not, it is necessary to see how the
>> functionality still can be tested even if jar is not available. If the
>> source code is available, to keep the tests it is necessary to build jars
>> during tests or publish jars to maven and pull them as the test dependency.
>> I agree but this is orthogonal to my question?
>>
>> If you agree, why not to temporarily disable tests?
>>
>>
>> 1. From what I read, the actual concern from you I get is: "the solution
>> relies on jar being present or not present on the classpath...".
>> Maintaining test coverage is much more important than making the test
>> code slightly harder to read IMO.
>> I think Junteak explained it better at
>> https://github.com/apache/spark/pull/50378#pullrequestreview-2712679827.
>>
>> 2. I have 6 +1s in https://github.com/apache/spark/pull/50378. I will
>> merge this in 48 hours to resolve this issue. The community seems to agree
>> with this approach.
>>
>> I raised my concerns with the approach that relies on detecting jar at
>> the runtime and keeping UNLICENSED jar files in the Github. If the
>> community agrees with your approach, I disagree and commit. Note that I
>> still have an outstanding comment on your PR
>> https://github.com/apache/spark/pull/50378#discussion_r2012935532.
>>
>>
>> PS: I’m a bit disappointed that my email requesting a video call was
>> ignored. Sometimes, a quick video call can save a lot of time compared to
>> texting.
>>
>> Sorry, but I did not receive any email requesting video call. Where the
>> request was made? I am open to the video call assuming that summary will be
>> posted to the dev list.
>>
>> Note that I am disappointed that multiple requests to review my PRs were
>> ignored or left unaswered too. I also have an outstanding question on the
>> revert here
>> https://lists.apache.org/thread/o8047n1cp8nc0q8c2ndht82h28p8j9jq.
>>
>>
>>
>> On Wed, 26 Mar 2025 at 04:14, Rozov, Vlad <vro...@amazon.com.invalid>
>> wrote:
>>
>>> The policy [1] is quite clear and the fact that other projects do not
>>> include compiled jars (including test jars) into the source release
>>> confirms the rule:
>>>
>>> "Every ASF release MUST contain one or more source packages, which MUST
>>> be sufficient for a user to build and test the release provided they have
>>> access to the appropriate platform and tools. A source release SHOULD not
>>> contain compiled code.”
>>>
>>> In addition to that UNLICENSED artifacts are against ASF policy as well.
>>>
>>> At this point there are 3 ways to approach the issue:
>>>
>>> 1. Release as is with jars.
>>> 2. Remove jars and disable affected test. Enable individual tests once
>>> source code for those jars is provided.
>>> 3. Remove jars from the source release only and keep them in the GitHub
>>> repo.
>>>
>>> My vote is to proceed with 2 and I don’t see why it is not solving the
>>> issue in your opinion. At the end it is up to PMC members to decide and
>>> cast the vote.
>>>
>>> Thank you,
>>>
>>> Vlad
>>>
>>> [1] https://www.apache.org/legal/release-policy.html#artifacts
>>>
>>>
>>> On Mar 25, 2025, at 11:29 AM, Sean Owen <sro...@gmail.com> wrote:
>>>
>>> I personally think you are reading this too narrowly; the principle is,
>>> as given:
>>> "...MUST contain one or more source packages, which MUST be sufficient
>>> for a user to build and test the release..."
>>> "All releases are in the form of the source materials needed to make
>>> changes to the software being released."
>>>
>>> I don't think the status quo actually contravenes that.
>>> That said, everyone is in agreement to just clean this up.
>>> But I think your position isn't actually solving any problem that this
>>> principle is intended to prevent.
>>>
>>> On Tue, Mar 25, 2025 at 1:25 PM Rozov, Vlad <vro...@amazon.com.invalid>
>>> wrote:
>>>
>>>> I already casted my vote. To clarify, having compiled unlicensed jars
>>>> in the source release is strictly against ASF policy [1]. Between a tiny
>>>> chance that some tests and functionality will break and a small chance that
>>>> ASF will request pull out of a long awaited release due to the policy
>>>> violation, I’d rather choose to break those tests.
>>>>
>>>> Thank you,
>>>>
>>>> Vlad
>>>>
>>>> PS. In addition to Hive and Hadoop source releases that Dongjoon
>>>> checked, I checked Apache Flink and Beam and none of those releases
>>>> includes jars.
>>>>
>>>> [1]  https://www.apache.org/legal/release-policy.html
>>>>
>>>>
>>>> On Mar 25, 2025, at 8:46 AM, Holden Karau <holden.ka...@gmail.com>
>>>> wrote:
>>>>
>>>> So I think if I understand folks concerns it’s that we’ve let it slide
>>>> in the past and at some point we’ve got to stop letting it slide because
>>>> there is some concern we might not be meeting the ASF guidance here.
>>>>
>>>> Personally I think given they’re test artifacts and how delayed Spark 4
>>>> is we should not block the release but we can agree to block anything
>>>> beyond Spark 4 on this as a compromise.
>>>>
>>>> What do folks think?
>>>>
>>>> Twitter: https://twitter.com/holdenkarau
>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>>>> <https://www.fighthealthinsurance.com/?q=hk_email>
>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>> Pronouns: she/her
>>>>
>>>>
>>>> On Tue, Mar 25, 2025 at 8:43 AM Reynold Xin <r...@databricks.com.invalid>
>>>> wrote:
>>>>
>>>>> While I'd love to resolve this issue, I still don't understand why we
>>>>> would block the release for this.
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Mar 25, 2025 at 7:49 AM Rozov, Vlad <vro...@amazon.com.invalid>
>>>>> wrote:
>>>>>
>>>>>> The difference is in the way how tests are disabled.
>>>>>>
>>>>>> - the approach encourages keeping jars files in the Apache Spark repo
>>>>>> - it is hard to identify what tests are impacted by jars so they can
>>>>>> be properly fixed
>>>>>> - the solution relies on jar being present or not present on the
>>>>>> classpath. Tests may be skipped unintentionally. It is also very easy to
>>>>>> introduce new tests that do not skip if jar does not exist. Such test 
>>>>>> will
>>>>>> break only during release.
>>>>>>
>>>>>> IMO, it is necessary to see if the source code for test jars is
>>>>>> available or can be reconstructed. If not, it is necessary to see how the
>>>>>> functionality still can be tested even if jar is not available. If the
>>>>>> source code is available, to keep the tests it is necessary to build jars
>>>>>> during tests or publish jars to maven and pull them as the test 
>>>>>> dependency.
>>>>>>
>>>>>> Thank you,
>>>>>>
>>>>>> Vlad
>>>>>>
>>>>>> On Mar 24, 2025, at 11:52 PM, Hyukjin Kwon <gurwls...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>> What's the difference between disabling tests for dev and release vs
>>>>>> only for release?
>>>>>>
>>>>>> On Tue, 25 Mar 2025 at 15:36, Rozov, Vlad <vro...@amazon.com.invalid>
>>>>>> wrote:
>>>>>>
>>>>>>> Overall I don’t buy the solution where tests are skipped based on
>>>>>>> the presence of a jar file. It looks too fragile to me. What if there 
>>>>>>> is a
>>>>>>> bug that does not add jar to a classpath? The test would be skipped, but
>>>>>>> not because jar was deleted, but because classpath is incorrect.
>>>>>>>
>>>>>>> Thank you,
>>>>>>>
>>>>>>> Vlad
>>>>>>>
>>>>>>> On Mar 24, 2025, at 7:56 PM, Hyukjin Kwon <gurwls...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Valid concern. Maybe we can mark tests ignored when those tests do
>>>>>>> not exist for now. So tagged commit will skip those tests. Dev commits 
>>>>>>> will
>>>>>>> still test them.
>>>>>>>
>>>>>>> On Tue, 25 Mar 2025 at 11:47, Jungtaek Lim <
>>>>>>> kabhwan.opensou...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Maybe we should also check that it is mandatory for source code
>>>>>>>> being distributed under release to be able to pass the test suites? If 
>>>>>>>> this
>>>>>>>> is mandatory, we can't just modify the release script to simply remove 
>>>>>>>> the
>>>>>>>> jars, because this will break the tests in source code distribution.
>>>>>>>>
>>>>>>>> Actually this is my understanding to make sure tests pass from
>>>>>>>> source code and could build the same artifacts we release from source 
>>>>>>>> code,
>>>>>>>> but I might be wrong.
>>>>>>>>
>>>>>>>> On Tue, Mar 25, 2025 at 11:32 AM Hyukjin Kwon <gurwls...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Made a PR first (https://github.com/apache/spark/pull/50378).
>>>>>>>>>
>>>>>>>>> BTW, I agree that we should have the source code along with the
>>>>>>>>> jars, and ideally the dev branch should not contain them as well. 
>>>>>>>>> This is a
>>>>>>>>> technical depth.
>>>>>>>>> For this, I hope we can improve this incrementally.
>>>>>>>>>
>>>>>>>>> I will also take a look and see if we can reject jars
>>>>>>>>> automatically in PRs or CI.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, 25 Mar 2025 at 11:15, Hyukjin Kwon <gurwls...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> So the issues are source releases (
>>>>>>>>>> https://github.com/apache/spark/tags) containing those jars,
>>>>>>>>>> right? Can we add the removal of test jars at the part of the release
>>>>>>>>>> process.
>>>>>>>>>>
>>>>>>>>>> They aren't included in binary releases in any event so removal
>>>>>>>>>> on every source release should work.
>>>>>>>>>>
>>>>>>>>>> On Tue, 25 Mar 2025 at 10:51, Jungtaek Lim <
>>>>>>>>>> kabhwan.opensou...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Let's make this very clear - do we not have a source code to
>>>>>>>>>>> build a jar, or have no way to infer the source code being used for 
>>>>>>>>>>> the
>>>>>>>>>>> jar?
>>>>>>>>>>>
>>>>>>>>>>> I understand the concern, but if this is a huge issue, why no
>>>>>>>>>>> one has looked into this and here we just debate whether the 
>>>>>>>>>>> affected tests
>>>>>>>>>>> need to be dropped/disabled or not? Whenever we add some test 
>>>>>>>>>>> resources
>>>>>>>>>>> like a golden file, we tend to leave the part of the code to build 
>>>>>>>>>>> the
>>>>>>>>>>> golden file. Did we check and confirm these jars are not the case 
>>>>>>>>>>> and we
>>>>>>>>>>> lost the source code to build?
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Mar 25, 2025 at 9:35 AM Rozov, Vlad
>>>>>>>>>>> <vro...@amazon.com.invalid> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> First of all I don’t think that conclusion on the
>>>>>>>>>>>> https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k is
>>>>>>>>>>>> correct. Jar files included into the source release are compiled 
>>>>>>>>>>>> from the
>>>>>>>>>>>> code and replacing them with dat or jpeg files won’t work. 
>>>>>>>>>>>> Including jar
>>>>>>>>>>>> files into the source release is against ASF policy and my -1 will 
>>>>>>>>>>>> stay as
>>>>>>>>>>>> long as jars are included into the source release. As this issue 
>>>>>>>>>>>> was raised
>>>>>>>>>>>> not for the first time and there was no action (actually more jars 
>>>>>>>>>>>> were
>>>>>>>>>>>> added), IMO, the issue should now be handled as the release 
>>>>>>>>>>>> blocker.
>>>>>>>>>>>>
>>>>>>>>>>>> I don’t see anything in the proposal that suggests that fix
>>>>>>>>>>>> for SPARK-51318 is or should be blocked by umbrella JIRA. The 
>>>>>>>>>>>> proposal was
>>>>>>>>>>>> to recover tests one by one. The PR that I have open will allow to
>>>>>>>>>>>> accomplish these tasks as all disabled tests refer to
>>>>>>>>>>>> SPARK-51318.
>>>>>>>>>>>>
>>>>>>>>>>>> I can only help with SPARK-51318 at this point. Somebody else
>>>>>>>>>>>> will have to look into keeping tests enabled as it requires source 
>>>>>>>>>>>> code for
>>>>>>>>>>>> the test jars.
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you,
>>>>>>>>>>>>
>>>>>>>>>>>> Vlad
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mar 24, 2025, at 4:55 PM, Hyukjin Kwon <gurwls...@apache.org>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> I still disagree with just disabling tests and removing the
>>>>>>>>>>>> jars without making sure that we will enable them back.
>>>>>>>>>>>> I want to EITHER make sure we have a plan and someone to drive,
>>>>>>>>>>>> and the tests will be enabled back, OR have a one fix that does 
>>>>>>>>>>>> all.
>>>>>>>>>>>> Otherwise, my -1 stands if we can't be sure of that.
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, 25 Mar 2025 at 08:51, Hyukjin Kwon <
>>>>>>>>>>>> gurwls...@apache.org> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> From what I read in the last discussion in the legal thread (
>>>>>>>>>>>>> https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k),
>>>>>>>>>>>>> we don't really need to rush and block the release.
>>>>>>>>>>>>> I don't think we should block the release, remove the CI, and
>>>>>>>>>>>>> just remove the jars.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Rozov, the original proposal of this thread is 1. to first
>>>>>>>>>>>>> disable the tests, and 2. open an umbrella JIRA to enable 
>>>>>>>>>>>>> individual tests.
>>>>>>>>>>>>> Since you're driving this, would you mind either making a
>>>>>>>>>>>>> proper fix in one go, or create an umbrella JIRA to drive this?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, 24 Mar 2025 at 23:46, Rozov, Vlad
>>>>>>>>>>>>> <vro...@amazon.com.invalid> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Let’s open a formal vote on the subject. I have open WIP PR
>>>>>>>>>>>>>> https://github.com/apache/spark/pull/50231 that is currently
>>>>>>>>>>>>>> blocked by -1.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thank you,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Vlad
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mar 24, 2025, at 7:05 AM, Wenchen Fan <cloud0...@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It seems there’s no quick fix for this issue. Should we
>>>>>>>>>>>>>> remove these jars and disable the tests for now to comply with 
>>>>>>>>>>>>>> ASF policy?
>>>>>>>>>>>>>> While this would temporarily reduce test coverage until we 
>>>>>>>>>>>>>> refactor the
>>>>>>>>>>>>>> tests to avoid pre-compiled jars, we can encourage Spark vendors 
>>>>>>>>>>>>>> not to
>>>>>>>>>>>>>> cherry-pick this test-disabling commit so they can help report 
>>>>>>>>>>>>>> any test
>>>>>>>>>>>>>> failures. That said, since these tests are quite old and stable, 
>>>>>>>>>>>>>> failures
>>>>>>>>>>>>>> are unlikely.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Wenchen
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Mar 13, 2025 at 12:15 AM Rozov, Vlad
>>>>>>>>>>>>>> <vro...@amazon.com.invalid> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> There is a difference between technical debt and legal
>>>>>>>>>>>>>>> issue. ASF may request to pull out release that does not meet 
>>>>>>>>>>>>>>> ASF policy
>>>>>>>>>>>>>>> (and having tests is not ASF policy). IMO, SPARK-51318 should 
>>>>>>>>>>>>>>> be a blocker
>>>>>>>>>>>>>>> for the next release or handled like a blocker.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thank you,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Vlad
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mar 10, 2025, at 6:02 PM, Jungtaek Lim <
>>>>>>>>>>>>>>> kabhwan.opensou...@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +1 to Hyukjin. If the test is effective, we should
>>>>>>>>>>>>>>> definitely retain the effectiveness of the test, unless we end 
>>>>>>>>>>>>>>> up with the
>>>>>>>>>>>>>>> conclusion that there is no way to do that.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Mar 11, 2025 at 9:29 AM Hyukjin Kwon <
>>>>>>>>>>>>>>> gurwls...@apache.org> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If we should fix, let's make sure we don't just disable the
>>>>>>>>>>>>>>>> tests - we will create another set of technical debt.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, 27 Feb 2025 at 09:11, Rozov, Vlad
>>>>>>>>>>>>>>>> <vro...@amazon.com.invalid> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I’ll look into the JIRA. Please assign it to me.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thank you,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Vlad
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> > On Feb 26, 2025, at 11:33 PM, Yang Jie <
>>>>>>>>>>>>>>>>> yangji...@apache.org> wrote:
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > +1, Agree to remove the jar files from the Apache Spark
>>>>>>>>>>>>>>>>> repository and disable the affected tests.
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > For the current test scenarios that use jar files, I
>>>>>>>>>>>>>>>>> believe we can definitely find a more reasonable testing 
>>>>>>>>>>>>>>>>> approach.
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > Thanks,
>>>>>>>>>>>>>>>>> > Jie Yang
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > On 2025/02/26 16:57:45 "Rozov, Vlad" wrote:
>>>>>>>>>>>>>>>>> >> +1 on fixing test jars, though the way how it is fixed
>>>>>>>>>>>>>>>>> needs to be discussed, IMO. In the short term removing jars 
>>>>>>>>>>>>>>>>> may still be
>>>>>>>>>>>>>>>>> the best option to satisfy ASF legal policy and avoid release 
>>>>>>>>>>>>>>>>> removal.
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >> AFAIK, ASF mandates that users and developers have
>>>>>>>>>>>>>>>>> source code that they build from (source release), not that 
>>>>>>>>>>>>>>>>> they run
>>>>>>>>>>>>>>>>> (binary release).
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >> Thank you,
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >> Vlad
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >>> On Feb 26, 2025, at 8:47 AM, Dongjoon Hyun <
>>>>>>>>>>>>>>>>> dongj...@apache.org> wrote:
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>> Thank you for your reply, Sean.
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>> I expected that argument exactly so that I started by
>>>>>>>>>>>>>>>>> quoting your sentence in the above.
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>> I understood the reasoning in 2018. However, there are
>>>>>>>>>>>>>>>>> two reasons why I brought this again in 2025:
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>> First, the open source sprit is technically and
>>>>>>>>>>>>>>>>> literally "no compiled code in a source release" like Apache 
>>>>>>>>>>>>>>>>> Hadoop and
>>>>>>>>>>>>>>>>> Hive community does. Justin, Vlad, and Alex shared the same 
>>>>>>>>>>>>>>>>> perspective to
>>>>>>>>>>>>>>>>> the Apache Spark PMC.
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>> $ tar tvf apache-hive-4.0.1-src.tar.gz | grep 'jar$' |
>>>>>>>>>>>>>>>>> wc -l
>>>>>>>>>>>>>>>>> >>>      0
>>>>>>>>>>>>>>>>> >>> $ tar tvfz hadoop-3.4.1-src.tar.gz | grep 'jar$' | wc
>>>>>>>>>>>>>>>>> -l
>>>>>>>>>>>>>>>>> >>>      0
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>> Second, last year, the open source communities were
>>>>>>>>>>>>>>>>> hit by CVE-2024-3094 ("XZ Utils Backdoor") in the world-wide 
>>>>>>>>>>>>>>>>> manner where
>>>>>>>>>>>>>>>>> the backdoor was hidden in the test object. I believe most of 
>>>>>>>>>>>>>>>>> us are aware
>>>>>>>>>>>>>>>>> of that. At that time, the GitHub repository was disabled. As 
>>>>>>>>>>>>>>>>> a member of
>>>>>>>>>>>>>>>>> Apache Spark PMC, I'm suggesting to remove that risk from the 
>>>>>>>>>>>>>>>>> Apache Spark
>>>>>>>>>>>>>>>>> repository in 2025. I attached the following link to provide 
>>>>>>>>>>>>>>>>> the XZ Utils
>>>>>>>>>>>>>>>>> history explicitly.
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> https://www.akamai.com/blog/security-research/critical-linux-backdoor-xz-utils-discovered-what-to-know
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>> Although I agree that those test coverages are
>>>>>>>>>>>>>>>>> important, I don't think that's worthy for Apache Spark 
>>>>>>>>>>>>>>>>> community to take a
>>>>>>>>>>>>>>>>> risk to be shutdown. That's the lesson which I've learned 
>>>>>>>>>>>>>>>>> last year.
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>> Sincerely,
>>>>>>>>>>>>>>>>> >>> Dongjoon.
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>> On 2025/02/26 13:31:56 Sean Owen wrote:
>>>>>>>>>>>>>>>>> >>>> The gist of the initial 2018 thread was:
>>>>>>>>>>>>>>>>> >>>> These are not source .jar files that users use, but
>>>>>>>>>>>>>>>>> .jar files used to test
>>>>>>>>>>>>>>>>> >>>> loading of from .jar files. These are test resources
>>>>>>>>>>>>>>>>> only.
>>>>>>>>>>>>>>>>> >>>> I don't think this is what the spirit of the rule is
>>>>>>>>>>>>>>>>> speaking to, that the
>>>>>>>>>>>>>>>>> >>>> end-user code should always have source code, which
>>>>>>>>>>>>>>>>> is the right principle.
>>>>>>>>>>>>>>>>> >>>> Checking in the code somewhere is nice to have though
>>>>>>>>>>>>>>>>> and I think that was
>>>>>>>>>>>>>>>>> >>>> the idea here.
>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>> >>>> But, removing these and disabling potentially
>>>>>>>>>>>>>>>>> valuable tests seems like a
>>>>>>>>>>>>>>>>> >>>> step too far. There is no actual 'problem' w.r.t. the
>>>>>>>>>>>>>>>>> principle that users
>>>>>>>>>>>>>>>>> >>>> have source to the code they run.
>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>> >>>> The 2025 thread just retreads the same ground as the
>>>>>>>>>>>>>>>>> 2018 thread.
>>>>>>>>>>>>>>>>> >>>> But I don't see that we put this argument to the
>>>>>>>>>>>>>>>>> person who raised it
>>>>>>>>>>>>>>>>> >>>> again. Why not that first?
>>>>>>>>>>>>>>>>> >>>> And, if possible, go stick the source to these jars
>>>>>>>>>>>>>>>>> in the source tree,
>>>>>>>>>>>>>>>>> >>>> where available.
>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>> >>>> On Wed, Feb 26, 2025 at 1:08 AM Dongjoon Hyun <
>>>>>>>>>>>>>>>>> dongjoon.h...@gmail.com>
>>>>>>>>>>>>>>>>> >>>> wrote:
>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>> >>>>> Hi, All.
>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>> >>>>> Unfortunately, the Apache Spark project seems to
>>>>>>>>>>>>>>>>> have a technical debt in
>>>>>>>>>>>>>>>>> >>>>> the source code releases. It happens to be discussed
>>>>>>>>>>>>>>>>> at least twice on both
>>>>>>>>>>>>>>>>> >>>>> dev@spark and legal-discuss mailing lists. (Thank
>>>>>>>>>>>>>>>>> you for the head-up,
>>>>>>>>>>>>>>>>> >>>>> Vlad.)
>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>> >>>>> 1.
>>>>>>>>>>>>>>>>> https://lists.apache.org/thread/3sxw9gwp51mrkzlo2xchq1g20gbgbnz8
>>>>>>>>>>>>>>>>> >>>>> (2018-06-21, dev@spark)
>>>>>>>>>>>>>>>>> >>>>> 2.
>>>>>>>>>>>>>>>>> https://lists.apache.org/thread/xmbgpgt30n7fdd99pnbg7983qzzrx24k
>>>>>>>>>>>>>>>>> >>>>> (2018-06-25, legal-discuss@)
>>>>>>>>>>>>>>>>> >>>>> 3.
>>>>>>>>>>>>>>>>> https://lists.apache.org/thread/z3oq1db80vc8c7r6892hwjnq4h7hnwmd
>>>>>>>>>>>>>>>>> >>>>> (2025-02-25, dev@spark)
>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>> >>>>> To be short, according to the previous conclusion in
>>>>>>>>>>>>>>>>> 2018, the Apache
>>>>>>>>>>>>>>>>> >>>>> Spark community wanted to adhere to the ASF policy
>>>>>>>>>>>>>>>>> by removing those jar
>>>>>>>>>>>>>>>>> >>>>> files from source code releases (although it was not
>>>>>>>>>>>>>>>>> considered as a
>>>>>>>>>>>>>>>>> >>>>> release blocker at that time and until now).
>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>> >>>>>> it's important to be able to recreate these JARs
>>>>>>>>>>>>>>>>> somehow,
>>>>>>>>>>>>>>>>> >>>>>> and I don't think we have the source in the repo
>>>>>>>>>>>>>>>>> for all of them
>>>>>>>>>>>>>>>>> >>>>>> (at least, the ones that originate from Spark).
>>>>>>>>>>>>>>>>> >>>>>> That much seems like a must-do. After that, seems
>>>>>>>>>>>>>>>>> worth figuring out
>>>>>>>>>>>>>>>>> >>>>>> just how hard it is to build these artifacts from
>>>>>>>>>>>>>>>>> source.
>>>>>>>>>>>>>>>>> >>>>>> If it's easy, great. If not, either the test can be
>>>>>>>>>>>>>>>>> removed or
>>>>>>>>>>>>>>>>> >>>>>> we figure out just how hard a requirement this is.
>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>> >>>>> Given the unresolved issue for seven years, I
>>>>>>>>>>>>>>>>> proposed SPARK-51318 as a
>>>>>>>>>>>>>>>>> >>>>> potential solution to comply with ASF policy. After
>>>>>>>>>>>>>>>>> SPARK-51318, we can
>>>>>>>>>>>>>>>>> >>>>> recover the test coverage one by one later by
>>>>>>>>>>>>>>>>> addressing IDed TODO items
>>>>>>>>>>>>>>>>> >>>>> without any legal concerns during the votes.
>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>> >>>>> https://issues.apache.org/jira/browse/SPARK-51318
>>>>>>>>>>>>>>>>> >>>>> (Remove `jar` files from Apache Spark repository and
>>>>>>>>>>>>>>>>> disable affected
>>>>>>>>>>>>>>>>> >>>>> tests)
>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>> >>>>> WDYT?
>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>> >>>>> BTW, please note that I didn't define SPARK-51318 as
>>>>>>>>>>>>>>>>> a blocker for any
>>>>>>>>>>>>>>>>> >>>>> on-going releases yet.
>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>> >>>>> Best regards,
>>>>>>>>>>>>>>>>> >>>>> Dongjoon.
>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>>>>>> >>> To unsubscribe e-mail:
>>>>>>>>>>>>>>>>> dev-unsubscr...@spark.apache.org
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>>>>>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>>>>>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
>

Reply via email to