Note to spam everybody here but just wanted to let you know. VS Code Extension updated to v0.2.0 - Support for Tokenized BASIC .BA. You can open them read-only and it will detokenize it to text for your viewing pleasure. - Duplicate Line Numbers flag as an error - Line numbers that fall out of the accepted range flag as an error. - Unreachable code (code that isn't the target of any other line, or following a line that will always branch away) gets a Warning - Warning for lines that may tokenize larger than 255 tokens. - fixed some minor bugs with some incorrect highlighting
On Mon, Dec 8, 2025 at 2:36 AM Joshua O'Keefe <[email protected]> wrote: > > On Dec 7, 2025, at 7:29 PM, Andrew Ayers <[email protected]> wrote: > > > > you mentioned running smaller models locally... > > > > It's something that I've wanted to do, but I tend to wonder if I would > have the hardware to do anything useful; the best GPU I have available is > in an older model of the Oryx Pro from System76 > > Feel free to reach out off-list if you like. As a general rule inference > performance is a matter of having gobs of VRAM with the highest memory > bandwidth; for example I've fired up a 49B parameter model just a few > minutes ago on an AMD 7900XTX, which is a 24G device. With a relatively > small context (32K tokens rolling window) nearly half the model spills over > into main memory for processing at a much slower rate. This unfortunate > situation is tolerable mostly because my desktop machine has a 32-thread > Zen 4 CPU and more memory than I have a right to shake sticks at. > > This is enough of a single-machine rig to do inference on > modestly-quantized models (5-6bpw) at this large size. Models that fit in > VRAM scream. Model merging and quantization work is less stressful on the > GPU and poses no problem. Training is something I farm out to runpod > instances because I didn't buy a $7500 H100 when I had the money on hand. > > I just poked around one of my general purpose models and it did a fairly > mediocre job of producing 6502 assembly (a short exercise: put an Apple // > in high res graphics mode); it worked, sorta, but struggled a lot with > platform specifics. Likewise that particular model seemed to have very > little information on M100 systems—it frequently confused them with the I / > III, and tried very hard to write CP/M software or syncretic interpreted > BASIC. I did manage to convince it how PRINT@ works, though. I'll have > to break out a code-gen specific model to really put it to the test. > > In any case, if you'd like to get in touch I'm happy to help. One of my > hobbies a few years ago was helping folks with normal, small-scale home > systems be able to do LLM inference with readily available tools. For > everyone else: I'll let folks know if anything list-relevant comes out of > poking some of my code generation models. I'm... not especially hopeful. > The exercise is fun, though.
