On Wed, Dec 24, 2025 at 11:12 AM George M. Rimakis <[email protected]>
wrote:

> If you have any luck, let me know.
>
> Part of the reason why I wanted to try and let Claude have direct access
> to VirtualT, it because it was really struggling to debug things when the
> ML subroutines went wrong.
>
> It had a tendency to want to try and cache pointers, and by the time it
> got around to using them, they had drifted. Instead of jumping into its
> subroutine, it would just jump into some random garbage.
>
> So far, I’ve wasted a lot of time and tokens on reminding it to not make
> the same mistakes over and over.
>
> Today, I’ve wasted most of my tokens on getting it to navigate VirtualT,
> upload a .DO, enter BASIC, LOAD”prog.do” to tokenize, and RUN.
>
> The problem is, while Claude Code and Codex are perfectly fine at using
> the MCP as a part of a single pre-defined plan, it’s not really good at
> continuing without the human.
>
> So the entire testing plan needs to be defined up front. It makes it less
> agentic and more human-in-the-loop than I would have liked
>
> -George
>
>>
>>
Yeah. Having to be the tester for a bad programmer is no fun.

The approach of letting it eat its own dogfood is much better, particularly
for assembly code where crashes are pretty much the default behavior.

High level BASIC is effectively guardrails and a sandbox that makes the AI
seem more capable than it is.

My initial goal was to see if Gemini would test its own code against
CloudT. It refused.

-- John.

Reply via email to