Note to spam everybody here but just wanted to let you know.

VS Code Extension updated to v0.2.0
- Support for Tokenized BASIC .BA. You can open them read-only and it will
detokenize it to text for your viewing pleasure.
- Duplicate Line Numbers flag as an error
- Line numbers that fall out of the accepted range flag as an error.
- Unreachable code (code that isn't the target of any other line, or
following a line that will always branch away) gets a Warning
- Warning for lines that may tokenize larger than 255 tokens.
- fixed some minor bugs with some incorrect highlighting

On Mon, Dec 8, 2025 at 2:36 AM Joshua O'Keefe <[email protected]>
wrote:

> > On Dec 7, 2025, at 7:29 PM, Andrew Ayers <[email protected]> wrote:
> >
> >  you mentioned running smaller models locally...
> >
> > It's something that I've wanted to do, but I tend to wonder if I would
> have the hardware to do anything useful; the best GPU I have available is
> in an older model of the Oryx Pro from System76
>
> Feel free to reach out off-list if you like. As a general rule inference
> performance is a matter of having gobs of VRAM with the highest memory
> bandwidth; for example I've fired up a 49B parameter model just a few
> minutes ago on an AMD 7900XTX, which is a 24G device. With a relatively
> small context (32K tokens rolling window) nearly half the model spills over
> into main memory for processing at a much slower rate. This unfortunate
> situation is tolerable mostly because my desktop machine has a 32-thread
> Zen 4 CPU and more memory than I have a right to shake sticks at.
>
> This is enough of a single-machine rig to do inference on
> modestly-quantized models (5-6bpw) at this large size. Models that fit in
> VRAM scream. Model merging and quantization work is less stressful on the
> GPU and poses no problem. Training is something I farm out to runpod
> instances because I didn't buy a $7500 H100 when I had the money on hand.
>
> I just poked around one of my general purpose models and it did a fairly
> mediocre job of producing 6502 assembly (a short exercise: put an Apple //
> in high res graphics mode); it worked, sorta, but struggled a lot with
> platform specifics. Likewise that particular model seemed to have very
> little information on M100 systems—it frequently confused them with the I /
> III, and tried very hard to write CP/M software or syncretic interpreted
> BASIC.  I did manage to convince it how PRINT@ works, though.  I'll have
> to break out a code-gen specific model to really put it to the test.
>
> In any case, if you'd like to get in touch I'm happy to help. One of my
> hobbies a few years ago was helping folks with normal, small-scale home
> systems be able to do LLM inference with readily available tools. For
> everyone else: I'll let folks know if anything list-relevant comes out of
> poking some of my code generation models. I'm... not especially hopeful.
> The exercise is fun, though.

Reply via email to