Hi Simon,

> I think it could be improved by taking into this concern:
> 
>   All contributions that were produced using assisted coding tools must
>   make that clear as part of the contribution, to simplify the role of
>   the reviewer and future maintainance of the code.
> 
> I think this is consistent with normal traditional ethical behaviour to
> clarify the origin of contributions.
> 
> This is not that different from a git commit saying 'Run indent on
> code', or 'Update copyright years' or some other mechanic tool-generated
> code update.

I disagree with this one:

* Yes, it is traditional ethical behaviour to mention the origin of
  contributions. Here, the question is regarding _origin_ vs. _tool_ :
  If the LLM's output still has some human origin attached to it (other
  than the contributor), the copyright issue must also persist. That is
  why the proposed policy limits the size to a few lines.

* When a contributor submits a patch, it is their own business with which
  _tool_ they have done it. Done a large refactoring by hand? or through
  repeated search&replace sessions? or through some ad-hoc Emacs Lisp code?
  That's entirely their business.
  The reviewer should not be bothered with that. If some tool was unreliable,
  it was the contributor's task to perform after-the-fact checks to avoid
  mistakes.
  Effects on future maintenance of the code? Should not exist.

* When we have a commit saying "Update copyright years by running
  'update-copyright'", the point is not to say how it was done, but as a
  hint how it can be re-done the next time a similar change is needed.

> I worry that this may limit some reasonable uses of LLM coding
> assistants: test code.  Those tools can be used to generate large chunks
> of boring test code.  Test code is often harmless, and either they PASS
> or FAIL and we can test this continously.  I see some value in allowing
> that, but it would be out of scope of the policy below.  The attack
> surface for test code is much smaller than actual running code.

It is true that in some projects/packages, such automatically generated unit
tests can be welcome. However, the acceptance of such test code would imply
two assumptions:

  * The copyright/license issue is not present (how to guarantee that?) or
    is less relevant (well, really? is a legal copyright issue less relevant
    just because it's only in test code?).
    When asking an LLM for unit tests for e.g. the 'regex' module, I would
    actually expect some LLM to reproduce the unit test from some BSD platform.

  * Test code is less frequently touched during maintenance, therefore it is
    OK if such test code is larger than a human would write it.

I have seen projects where massive code repetition in unit tests was
considered OK; the effect was maintenance of these unit tests was
practically impossible. Therefore, regarding Gnulib, IMO it is
desirable to not exaggerate the size of the unit tests. I'm not sure if
you can tell an LLM "write unit tests but omit logically redundant ones"?

Bruno




Reply via email to