On 3/8/26 15:35, B 9 wrote:
On Sun, Mar 8, 2026 at 6:03 AM Stephen Adolph <[email protected] <mailto:[email protected]>> wrote:

    Two extensions that would be nice to have.  What do you think?

    1.  Some way to identify and adjust addresses to enable relocatable
    code.  I know teeny has this feature.  Can we have another encoding
    element for that?

/Having/ an encoding element is easy since there are so many “invalid” codes — for instance, !r could be prefixed before two byte addresses.

/Using/ it, however, is another matter. As Brian points out, the program to generate such code may be tricky for arbitrary programs. I don’t know much about doing that yet, but it seems to me you’d need to start from the source assembly code, not just a .CO file. Also, there’s a beauty to the encoding right now: there is exactly one escape character (|!|) and it does exactly one thing (flip the high bit of the next character). I know all real-world programs differ from the crystalline beauty of the original concept, but there’s also something to be said for having small, sharp tools that do only one thing but do them remarkably well.

An optional extension would also be contrary to the direction Brian was heading, with his data and loader being separable. My program already ties the loader and data together as I’m just seeing this encoding as a way to transmit a file to a Model T, not an archival format, but I recognize the loss of generality. .

Investigating HXFER <https://github.com/LivingM100SIG/Living_M100SIG/ blob/main/M100SIG/Lib-10-TANDY200/HXFER.DOC> is on my to do list. Instead of marking the changes needed for relocation inline, as we would be doing, it simply appends bytes at the end of a hex file. While it is inefficient compared to your !-encoding, it has some very nice features including being future-proof: I was able to easily convert the HXFER files to .CO without knowing anything about HXFER by simply using a standard hex-to-binary tool (|xxd -r -p|) and truncating the file the LEN specified in the header. Will a future computer-archaeologist curse us for inventing a new format? I think not since, unlike HXFER, our method has the benefit of including the decoder with the data, but it’s worth considering.

    2.  Adjust ROM calls per platform.   This might require another
    method altogether.  Maybe there would need to be a library of calls
    and some code to identify the call.  More ambitious.

Not as crazy as it sounds. I’ve been struck repeatedly by how the official ROM calls have persisted from the Kyotronic 85 to its kin, usually with just the address changed. The main incompatibility that I recall so far is the RS232 calls for the NEC PC-8201A which have a very peculiar initialization string. That said, it is definitely more ambitious and probably shouldn’t even be part of the encoding. At least not at first. I’d want to see a proof of concept that’s able to convert a non-trivial program (from assembly source code) to .CO files for more than one machine. I see Model 100 / Tandy 200 conversion as the most likely to be successful. I believe Kyotronic to Olivetti M10 should be fairly similar, too.

I’d also been considering using |!| followed by digits to represent Run Length Encoding (RLE) of the next non-digit character, similar to the Sixel image protocol. However, that’s probably not a terribly useful thing to do given that very few .CO files have the same byte repeated more than three times.

I’m not quite ready to work on any of these extensions yet as I’d like to finish up my loader (100% functional, proper sanity checks, documentation, and testing). Please bring up relocating in the encoding and automagic translation again as they may get more traction in my brain later.

    Also have we landed on an agreed encoding?

Until we find the next hiccup, I think yes. The main features I believe are:

  * All characters represent themselves except |!| and a character
    preceded by |!|.
  * A character preceded by |!| will have its high-bit flipped and the
    |!| discarded.
  * All Model T computers share a set of characters which should be encoded:
      o A literal |!| (bang, 33) is encoded as characters 33 and 161,
        which display as |!à| on a Model 100. ⁰
      o |"| (double quote, 34) confuses BASIC when included in DATA
        statements.
      o |^Z| (control-Z, 26) signals the End Of File and cannot be
        received over the serial port to a text file.¹
      o All control characters (anything less than |Space|, 32) are
        removed by the BASIC tokenizer with the exception of |Tab|.²
      o |DEL| (delete, 127) is removed when a file or program is opened
        with EDIT.

⁰ Other Kyotronic kin may not show anything for “high ASCII”, but the characters are preserved.
¹ In fact, after a ^Z, the rest of the file will be missing!


 ² For simplicity, Brian’s co2ba also encodes |Tab| and |Space|, but
that’s still valid by this encoding.

Actually I stopped doing that since it's silly.

The code is nothing at all, and actually more generic without using direct selection logic like "is the value less than 34".

It's far more useful to have it ask "is the value in the unsafe list ?" and then the list can change any time for any reason in any way without changing the user needing to change the code.

So the unsafe list, a special option that's just a shorthand for adding 127 to the unsafe list, the choice of "!", the xor value, the length of the output lines, the line numbering, are all actually configurable simply because there is no reason not to.

Like if I hadn't known about the special case of 127 because it never bit me personally, and so I never added the EDITSAFE option just for that, the script would still work for someone else who discovered they needed that. They could just use the UNSAFE option to customize the list of unsafe values.

One thing that's fixed right now is the fact that the shift operation is xor and not whatever you want like the way I originally just had +/-64

But for instance, xor 64 actually works too, and has the property that 127 doesn't become 255, nor would 255 become 127. I guess that doesn't matter but I just always thought 255 was sometimes a problem too, maybe in other venues that need to handle the file. I guess the utility of being able to change the xor value would be if you need to encode something that would end up making another value you can't have, so you can shift the whole mess around and find something that works for your particular case. like idk some old system 7 mac software that for whatever reason doesn't like some byte, or maybe doesn't like some byte *combo* like it interprets it. Like in fact bash by default interprets ! itself when you type it in manually or paste into a command line. So If you were going to be dealing with encoded data on the command line, that is a case where you might wish it were not any of the meaningful bash characters.

I keep changing my mind about some things. Originally I was including the ! and the 64 or 128 in the header line so that the header says how to decode the payload along with it's size and exe address and name. Then made those not configurable and removed them from the header. But they are configurable again and so they should probably be considered metadata and be defined in the header. (the header data line)

I do consider the user free to generate a loader with whatever encoding options they may want for whatever their own reasons may be. So as far as I'm concerned there doesn't really need to be a religious consensus for the payload format, because the payload and it's decoder are generated together. It's good enough for me that the generator is itself a published and easily available thing after the fact, unlike so many of the exotic loaders that currently exist. Sure they technically include the code to decode, but the best ones include a binary blob that is essentially inscrutable. It exists and is physically possible to trace it's execution with a 80c85 datasheet and m100 reference manual...


I also now added a METHOD option that generates different encoding types.

So you can generate a loader that uses the quasi hex pair method James Yio and Kurt McCullum use.

And the even cruder simple direct csv ints.

There is not really any need I can see to use them for real, but it's interesting to generate the exact same payload the different ways where that is the only difference, just to compare them.

Basically they all end up taking the exact same full run time, because the faster ones are exactly offset by there increased transfer time (though that may not be true if you're not transferring as slow as I am) But of course the smaller storage size is an important difference so the new way is definitely the way to go.

METHOD A (default) = What do we call it? I was going to call it Adolph encoding, or "the quite ok encoding" (a joke referencing an image compression that came out a few years ago), but actually it's the result of all 3 of us at this point. We aren't doing exactly what Steve was and we each changed something. I wrote Adolph/B9/White in a comment in the script just to have some sort of label and get both your names in there.

B = Identical data as A but the loader code is implemented a different way.

H = hex pairs - but like James Yi and Kurt McCullum use, where the alphabet is like a-p and treated as "byte-97*256 + nextbyte-97".

I = The good old practically no code required ints. Just read & poke in a loop once for every byte. The loop iterator is just the address itself. The entire loop is just the tail end of the first and only line of code! But each byte takes 2 to 4 bytes...

And so at some point I hope to be able to add other methods like a fancy machine language option based on whatever you develop, and that should probably also advance over time to include yet more options to use actual compression.

I had thought about RLE too. I think it's been shown to more than pay for itself even though it's so simple and doesn't usually gain much. It's like low hanging fruit so I was probably going to do it sometime just to see how it works out.

Basically it's cool to have a loader generator instead of a loader, so you can spit out variations and test & compare all kinds of things easily. It's more work to get from zero to the first level of indirectly getting loader generator produce the final output you want (vs say just handcrafting a loader for one file directly) but once you do have a generic loader generator working even just for the simple easy cases where the binary isn't weird, it's easy to add tiny incremental additions to that and after a while you really have something cool without ever having to invest in some big project that wouldn't seem worth it.




Note that all high-ASCII codes are preserved and do not need to be escaped.

The main unknown about this encoding is what to refer to it as. We can keep calling it Stephen Adolph’s encoding, since you suggested it, but I

hah! what was I just saying... hehe


know I personally wouldn’t like having any idea named after me unless I was done coming up with new ideas. I have a feeling you’re not done yet, so in my head I’ve been calling it !-encoding (pronounced “bang encoding”). What do you think it should be called?

bangcode/!code works for me even though my generator will spit out anything you want in place of !. It needs to be short, so describing all the actual variable details is out. We could be bold and claim t-code for the rest of time. The encoding needed for Model T's, because it's the list of which are the illegal bytes that's different. Or A-code. WHat encoding is that? Oh it's just a code.

Maybe it could be the first ever legitimate reason to say "exscape" since it's exclamation escape coding.

I just realized... when using ! in particular, the coding is also quite literal. As you read the data, !a is in fact literally not a.


I was thinking of using space as the prefix for relocatable placeholders. It sounds bad but one it's one of the other free values below 34 and better than tab, and I think I kind of like the idea that it would make all the relocate objects stand out. tab would actually maybe be even better for that, not just because they stand out even more, but because it would mean all the normal tabs would get encoded and collapsed.

Then maybe space could be for rle?



--
bkw

Reply via email to