Re: Accepting AI generated contributions

Jonathan Ellis Fri, 15 Aug 2025 06:14:19 -0700

To the degree that we can draw conclusions from this three month old
thread, I submit that the top one is: ASF policy is not optimally clear.


(Personally I think the *spirit* of the policy is clear, as embodied in the
TLDR at the bottom, but the text itself is not and since it's a legal
document that's a problem.)

I have a lot of sympathy for the viewpoint that: we (contributors) are
responsible for what we submit and we (committers and PMC members) are
responsible for reviewing it. It shouldn't matter if copyrighted code was
submitted by someone pasting from an GPLed repo or decompiling a
competitor's product, or by someone channeling GPT-5 or Sonnet 4.

I think the reason this (reasonably!) makes some people uncomfortable is
that there is no way to 100% guarantee that your AI assistant didn't pull
out a GPL section from its internal training data, nor is there a way for a
developer to reasonably check for such a thing. So unlike in the manual
labor scenario you could, in theory, end up with inadvertent infringement.

The problem is that there isn't a solution for this, not really, not even
with a SBOM which would end up as the software equivalent of "security
theater."

So the options I see are
1. The developer + reviewer are responsible, we accept that mistakes may
happen, we will fix them if and when they do.
2. We publish a list of approved models and accept that it will probably be
quietly ignored by a lot of people since it will be out of date within
weeks, but hey at least we have legal cover.

Either way, I think we need to revert to ASF legal to clarify policy.

P.S. "Does don't-use-our-output-to-train-a-competitor language disqualify a
model/vendor" also seems to me to be plainly a question for Legal.

On Thu, Aug 14, 2025 at 12:26 PM Ariel Weisberg <[email protected]> wrote:

> Hi,
>
> I want to dig a little deeper into the actual ToS and make a distinction
> between the terms placing a burden on the output of the model and placing a
> burden on access/usage.
>
> Here are the Claude consumer ToS that seem relevant:
> ```
> You may not access or use, or help another person to access or use, our
> Services in the following ways:
>
>    1. To develop any products or services that compete with our Services,
>    including to develop or train any artificial intelligence or machine
>    learning algorithms or models or resell the Services.
>
> ```
>
> And the commercial ToS:
> ```
>
>    1. *Use Restrictions.* Customer may not and must not attempt to (a)
>    access the Services to build a competing product or service, including to
>    train competing AI models or resell the Services except as expressly
>    approved by Anthropic; (b) reverse engineer or duplicate the Services; or
>    (c) support any third party’s attempt at any of the conduct restricted in
>    this sentence.
>
> ```
> One way to interpret this is there is a burden on the access/usage and if
> what you are doing when you access/use is acceptable then the output is
> unencumbered. So for example if you are developing code for Apache
> Cassandra and you generate something for that purpose then your access was
> not any one of (a) or (b) and it would be a very large stretch to say that
> contributing that code to ASF contributes to (c).
>
> So unless I hear legal say otherwise I would say those ToS are acceptable.
>
> Now let's look at OpenAI's terms which state:
> ```
>
>    - Use Output to develop models that compete with OpenAI.
>
> ```
> This is more concerning because it's restriction on output not access.
>
> Gemini has restrictions on "generating or distributing content that
> facilitates:... Spam, phishing, or malware"
> and that is a little concerning because it sounds like it encumbers the
> output of the model not the access.
>
> It really really sucks to be in the position of trying to be a lawyer for
> every single service's ToS.
>
> Ariel
>
> On Thu, Aug 14, 2025, at 12:36 PM, Ariel Weisberg wrote:
>
> Hi,
>
> It's not up to us to interpret right? It's been interpreted by Apache
> Legal and if we are confused we can check, but this is one instance where
> they aren't being ambiguous or delegating to us to make a decision.
>
> I can't see how we can follow legal's guidance and accept output from
> models or services running models with these issues.
>
> This isn't even a change of what we settled on right? We seemed to broadly
> agree that we wouldn't accept output from models that aren't license
> compatible. What has changed is we have realized is that it applies to more
> models.
>
> At this point I don't think we should try to maintain a list. We should
> provide a brief guidance that we don't accept code from models/services
> that are not license compatible (and highlight that this is most popular
> services) and encourage people to watch out for models/services that might
> reproduce license incompatible training data.
>
> Ariel
>
> On Fri, Aug 1, 2025, at 1:13 PM, Josh McKenzie wrote:
>
> So I'll go ahead and preface this email - I'm not trying to open Pandora's
> Box or re-litigate settled things from the thread. *But...*
>
>         • The terms and conditions of the generative AI tool do not place
> any restrictions on use of the output that would be inconsistent with the
> Open Source Definition. https://opensource.org/osd/
>
> By that logic, Anthropic's terms would also run afoul of that right?
> https://www.anthropic.com/legal/consumer-terms
>
> You may not access or use, or help another person to access or use, our
> Services in the following ways:
> ...
> 2. To develop any products or services that compete with our Services,
> including to develop or train any artificial intelligence or machine
> learning algorithms or models or resell the Services.
> ...
>
>
> Strictly speaking, that collides with the open source definition:
> https://opensource.org/osd
>
> 6. No Discrimination Against Fields of Endeavor
> The license must not restrict anyone from making use of the program in a
> specific field of endeavor. For example, it may not restrict the program
> from being used in a business, or from being used for genetic research.
>
>
> Which is going to hold true for basically all AI platforms. At least right
> now, they all have some form of restriction and verbiage discouraging using
> their services to build competing services.
>
> Gemini, similar terms <https://ai.google.dev/gemini-api/terms>:
>
> You may not use the Services to develop models that compete with the
> Services (e.g., Gemini API or Google AI Studio). You also may not attempt
> to reverse engineer, extract or replicate any component of the Services,
> including the underlying data or models (e.g., parameter weights).
>
> Plus a prohibited use clause.
>
> So ISTM we should either be ok with all of them (i.e. cassandra doesn't
> compete with any of them and it matches the definition of open-source in
> the context of our project's usage) or ok with none of them. And I'm
> heavily in favor of the former interpretation.
>
>

Re: Accepting AI generated contributions

Reply via email to