> On Dec 7, 2025, at 7:29 PM, Andrew Ayers <[email protected]> wrote:
> 
>  you mentioned running smaller models locally...
> 
> It's something that I've wanted to do, but I tend to wonder if I would have 
> the hardware to do anything useful; the best GPU I have available is in an 
> older model of the Oryx Pro from System76

Feel free to reach out off-list if you like. As a general rule inference 
performance is a matter of having gobs of VRAM with the highest memory 
bandwidth; for example I've fired up a 49B parameter model just a few minutes 
ago on an AMD 7900XTX, which is a 24G device. With a relatively small context 
(32K tokens rolling window) nearly half the model spills over into main memory 
for processing at a much slower rate. This unfortunate situation is tolerable 
mostly because my desktop machine has a 32-thread Zen 4 CPU and more memory 
than I have a right to shake sticks at.

This is enough of a single-machine rig to do inference on modestly-quantized 
models (5-6bpw) at this large size. Models that fit in VRAM scream. Model 
merging and quantization work is less stressful on the GPU and poses no 
problem. Training is something I farm out to runpod instances because I didn't 
buy a $7500 H100 when I had the money on hand.

I just poked around one of my general purpose models and it did a fairly 
mediocre job of producing 6502 assembly (a short exercise: put an Apple // in 
high res graphics mode); it worked, sorta, but struggled a lot with platform 
specifics. Likewise that particular model seemed to have very little 
information on M100 systems—it frequently confused them with the I / III, and 
tried very hard to write CP/M software or syncretic interpreted BASIC.  I did 
manage to convince it how PRINT@ works, though.  I'll have to break out a 
code-gen specific model to really put it to the test.

In any case, if you'd like to get in touch I'm happy to help. One of my hobbies 
a few years ago was helping folks with normal, small-scale home systems be able 
to do LLM inference with readily available tools. For everyone else: I'll let 
folks know if anything list-relevant comes out of poking some of my code 
generation models. I'm... not especially hopeful. The exercise is fun, though.

Reply via email to