On Sun, Mar 8, 2026 at 6:03 AM Stephen Adolph <[email protected]> wrote:

Two extensions that would be nice to have.  What do you think?
>
> 1.  Some way to identify and adjust addresses to enable relocatable code.
> I know teeny has this feature.  Can we have another encoding element for
> that?
>
*Having* an encoding element is easy since there are so many “invalid”
codes — for instance, !r could be prefixed before two byte addresses.

*Using* it, however, is another matter. As Brian points out, the program to
generate such code may be tricky for arbitrary programs. I don’t know much
about doing that yet, but it seems to me you’d need to start from the
source assembly code, not just a .CO file. Also, there’s a beauty to the
encoding right now: there is exactly one escape character (!) and it does
exactly one thing (flip the high bit of the next character). I know all
real-world programs differ from the crystalline beauty of the original
concept, but there’s also something to be said for having small, sharp
tools that do only one thing but do them remarkably well.

An optional extension would also be contrary to the direction Brian was
heading, with his data and loader being separable. My program already ties
the loader and data together as I’m just seeing this encoding as a way to
transmit a file to a Model T, not an archival format, but I recognize the
loss of generality. .

Investigating HXFER
<https://github.com/LivingM100SIG/Living_M100SIG/blob/main/M100SIG/Lib-10-TANDY200/HXFER.DOC>
is on my to do list. Instead of marking the changes needed for relocation
inline, as we would be doing, it simply appends bytes at the end of a hex
file. While it is inefficient compared to your !-encoding, it has some very
nice features including being future-proof: I was able to easily convert
the HXFER files to .CO without knowing anything about HXFER by simply using
a standard hex-to-binary tool (xxd -r -p) and truncating the file the LEN
specified in the header. Will a future computer-archaeologist curse us for
inventing a new format? I think not since, unlike HXFER, our method has the
benefit of including the decoder with the data, but it’s worth considering.

2.  Adjust ROM calls per platform.   This might require another method
> altogether.  Maybe there would need to be a library of calls and some code
> to identify the call.  More ambitious.
>
Not as crazy as it sounds. I’ve been struck repeatedly by how the official
ROM calls have persisted from the Kyotronic 85 to its kin, usually with
just the address changed. The main incompatibility that I recall so far is
the RS232 calls for the NEC PC-8201A which have a very peculiar
initialization string. That said, it is definitely more ambitious and
probably shouldn’t even be part of the encoding. At least not at first. I’d
want to see a proof of concept that’s able to convert a non-trivial program
(from assembly source code) to .CO files for more than one machine. I see
Model 100 / Tandy 200 conversion as the most likely to be successful. I
believe Kyotronic to Olivetti M10 should be fairly similar, too.

I’d also been considering using ! followed by digits to represent Run
Length Encoding (RLE) of the next non-digit character, similar to the Sixel
image protocol. However, that’s probably not a terribly useful thing to do
given that very few .CO files have the same byte repeated more than three
times.

I’m not quite ready to work on any of these extensions yet as I’d like to
finish up my loader (100% functional, proper sanity checks, documentation,
and testing). Please bring up relocating in the encoding and automagic
translation again as they may get more traction in my brain later.

Also have we landed on an agreed encoding?
>
Until we find the next hiccup, I think yes. The main features I believe are:

   - All characters represent themselves except ! and a character preceded
   by !.
   - A character preceded by ! will have its high-bit flipped and the !
   discarded.
   - All Model T computers share a set of characters which should be
   encoded:
      - A literal ! (bang, 33) is encoded as characters 33 and 161, which
      display as !à on a Model 100. ⁰
      - " (double quote, 34) confuses BASIC when included in DATA
      statements.
      - ^Z (control-Z, 26) signals the End Of File and cannot be received
      over the serial port to a text file.¹
      - All control characters (anything less than Space, 32) are removed
      by the BASIC tokenizer with the exception of Tab.²
      - DEL (delete, 127) is removed when a file or program is opened with
      EDIT.

⁰ Other Kyotronic kin may not show anything for “high ASCII”, but the
characters are preserved.
¹ In fact, after a ^Z, the rest of the file will be missing!
² For simplicity, Brian’s co2ba also encodes Tab and Space, but that’s
still valid by this encoding.

Note that all high-ASCII codes are preserved and do not need to be escaped.

The main unknown about this encoding is what to refer to it as. We can keep
calling it Stephen Adolph’s encoding, since you suggested it, but I know I
personally wouldn’t like having any idea named after me unless I was done
coming up with new ideas. I have a feeling you’re not done yet, so in my
head I’ve been calling it !-encoding (pronounced “bang encoding”). What do
you think it should be called?

—b9

Reply via email to