Hi Ricardo,

This first pass was very much meant as a test of Claude Code against a well
optimized codebase - GNUStep has many hours of work by those "Skilled in
the art...". Unlike assessing an early career programmer's code, our
internal expectation was that this would be a real test for Claude (as
compared to a first pass through some "vibe coding" script build) given how
much runtime GNUStep has had over the years. Internally we use GNUStep for
high performance computational biology in multithreaded environments with
CUDA code.  My colleague Tom MacSween and I have been coding in Objective-C
since the NeXT days.

The intention of this experiment was to see what it would find and suggest
in a few passes. The first pass (which I presented above) was Claude Opus
4.6's initial instrumentation test. We checked the changes to ensure they
compile and pass basic unit and performance metrics tests. As some of you
noted, and most of us would agree, that is not the same as being exercised
in real production code environments.

The next pass, which we are working through now is actually performance
testing these changes in our own internal tests to see if we see any
performance increases. As we are montioring it, Claude is finding
situations where it is saying "Oh, I think I made a mistake... need to
revert my change." - so it will be interesting to see where it ends up.

What we will do is, as Richard asked, break each change out if it passes
our internal testing, and make them seperate commits if we can, then it can
be assessed individually by the maintainers for suitability to the project
repos. We have made the first pass available just to make you aware of what
the experiment is showing so far. For us internally, multi-threaded
performance and server-side considerations are our goal, but we are going
to take a reasonable pass through the GUI tools as well and see what it
thinks - but we won't be aggressively testing them ourselves. At David's
direction - we'll not submit anything for libobjc2.

Regards,

Todd


On Mon, Apr 13, 2026 at 2:32 PM Riccardo Mottola <[email protected]>
wrote:

> Hi Todd,
>
> Todd White wrote:
> >
> > https://github.com/DTW-Thalion/gnustep-audit
> >
> > I wanted to share what we found and offer to contribute any or all of
> > the changes back upstream.
>
> Thanks for the work.
> I would best prefer to have separate PR-s so that each one can be
> analyzed, refuted, reworked or changed on.
> At a first glance, I cherry-picked a couple of commits in base and gui,
> there are interesting points to be analyzed.
>
> LLM generated suggestions are a hot topic these days, also among our
> community. On one side the question about ethics, on the other the noise
> they generate. Some members feel pressed by the generation of requests.
> Some project have banned them completely. I hope we don't bring GNUstep
> to that point, but keep a good line of usage.
>
> I have seen other attempts in AI usage with some fellow coders here, the
> advantage of here that the commits are retained atomic and so easier to
> single-check, refute or rewrite.
>
> I find it interesting that the whole codebase was checked.
>
>
> Question: Are any of these issues you found directly related to real
> bugs you found and open issues? Are you actively using GNUstep code?
> Or are the bugs found only by AI itself.
>
> I see some choices debatable, e.g. enlarging cache and buffer values,
> without hard data. Or using certain atomics function: this might work in
> specific environments, but break others. Having separate commits makes
> it easier to test, including running our own test suite (as limited as
> it currently is, though, in terms of architectures)
>
> Regards,
>
> Riccardo
>

Reply via email to