On Sun, Mar 8, 2026 at 6:03 AM Stephen Adolph <[email protected]> wrote:
Two extensions that would be nice to have. What do you think? > > 1. Some way to identify and adjust addresses to enable relocatable code. > I know teeny has this feature. Can we have another encoding element for > that? > *Having* an encoding element is easy since there are so many “invalid” codes — for instance, !r could be prefixed before two byte addresses. *Using* it, however, is another matter. As Brian points out, the program to generate such code may be tricky for arbitrary programs. I don’t know much about doing that yet, but it seems to me you’d need to start from the source assembly code, not just a .CO file. Also, there’s a beauty to the encoding right now: there is exactly one escape character (!) and it does exactly one thing (flip the high bit of the next character). I know all real-world programs differ from the crystalline beauty of the original concept, but there’s also something to be said for having small, sharp tools that do only one thing but do them remarkably well. An optional extension would also be contrary to the direction Brian was heading, with his data and loader being separable. My program already ties the loader and data together as I’m just seeing this encoding as a way to transmit a file to a Model T, not an archival format, but I recognize the loss of generality. . Investigating HXFER <https://github.com/LivingM100SIG/Living_M100SIG/blob/main/M100SIG/Lib-10-TANDY200/HXFER.DOC> is on my to do list. Instead of marking the changes needed for relocation inline, as we would be doing, it simply appends bytes at the end of a hex file. While it is inefficient compared to your !-encoding, it has some very nice features including being future-proof: I was able to easily convert the HXFER files to .CO without knowing anything about HXFER by simply using a standard hex-to-binary tool (xxd -r -p) and truncating the file the LEN specified in the header. Will a future computer-archaeologist curse us for inventing a new format? I think not since, unlike HXFER, our method has the benefit of including the decoder with the data, but it’s worth considering. 2. Adjust ROM calls per platform. This might require another method > altogether. Maybe there would need to be a library of calls and some code > to identify the call. More ambitious. > Not as crazy as it sounds. I’ve been struck repeatedly by how the official ROM calls have persisted from the Kyotronic 85 to its kin, usually with just the address changed. The main incompatibility that I recall so far is the RS232 calls for the NEC PC-8201A which have a very peculiar initialization string. That said, it is definitely more ambitious and probably shouldn’t even be part of the encoding. At least not at first. I’d want to see a proof of concept that’s able to convert a non-trivial program (from assembly source code) to .CO files for more than one machine. I see Model 100 / Tandy 200 conversion as the most likely to be successful. I believe Kyotronic to Olivetti M10 should be fairly similar, too. I’d also been considering using ! followed by digits to represent Run Length Encoding (RLE) of the next non-digit character, similar to the Sixel image protocol. However, that’s probably not a terribly useful thing to do given that very few .CO files have the same byte repeated more than three times. I’m not quite ready to work on any of these extensions yet as I’d like to finish up my loader (100% functional, proper sanity checks, documentation, and testing). Please bring up relocating in the encoding and automagic translation again as they may get more traction in my brain later. Also have we landed on an agreed encoding? > Until we find the next hiccup, I think yes. The main features I believe are: - All characters represent themselves except ! and a character preceded by !. - A character preceded by ! will have its high-bit flipped and the ! discarded. - All Model T computers share a set of characters which should be encoded: - A literal ! (bang, 33) is encoded as characters 33 and 161, which display as !à on a Model 100. ⁰ - " (double quote, 34) confuses BASIC when included in DATA statements. - ^Z (control-Z, 26) signals the End Of File and cannot be received over the serial port to a text file.¹ - All control characters (anything less than Space, 32) are removed by the BASIC tokenizer with the exception of Tab.² - DEL (delete, 127) is removed when a file or program is opened with EDIT. ⁰ Other Kyotronic kin may not show anything for “high ASCII”, but the characters are preserved. ¹ In fact, after a ^Z, the rest of the file will be missing! ² For simplicity, Brian’s co2ba also encodes Tab and Space, but that’s still valid by this encoding. Note that all high-ASCII codes are preserved and do not need to be escaped. The main unknown about this encoding is what to refer to it as. We can keep calling it Stephen Adolph’s encoding, since you suggested it, but I know I personally wouldn’t like having any idea named after me unless I was done coming up with new ideas. I have a feeling you’re not done yet, so in my head I’ve been calling it !-encoding (pronounced “bang encoding”). What do you think it should be called? —b9
