On 3/8/26 15:35, B 9 wrote:
On Sun, Mar 8, 2026 at 6:03 AM Stephen Adolph <[email protected]
<mailto:[email protected]>> wrote:
Two extensions that would be nice to have. What do you think?
1. Some way to identify and adjust addresses to enable relocatable
code. I know teeny has this feature. Can we have another encoding
element for that?
/Having/ an encoding element is easy since there are so many “invalid”
codes — for instance, !r could be prefixed before two byte addresses.
/Using/ it, however, is another matter. As Brian points out, the program
to generate such code may be tricky for arbitrary programs. I don’t know
much about doing that yet, but it seems to me you’d need to start from
the source assembly code, not just a .CO file. Also, there’s a beauty to
the encoding right now: there is exactly one escape character (|!|) and
it does exactly one thing (flip the high bit of the next character). I
know all real-world programs differ from the crystalline beauty of the
original concept, but there’s also something to be said for having
small, sharp tools that do only one thing but do them remarkably well.
An optional extension would also be contrary to the direction Brian was
heading, with his data and loader being separable. My program already
ties the loader and data together as I’m just seeing this encoding as a
way to transmit a file to a Model T, not an archival format, but I
recognize the loss of generality. .
Investigating HXFER <https://github.com/LivingM100SIG/Living_M100SIG/
blob/main/M100SIG/Lib-10-TANDY200/HXFER.DOC> is on my to do list.
Instead of marking the changes needed for relocation inline, as we would
be doing, it simply appends bytes at the end of a hex file. While it is
inefficient compared to your !-encoding, it has some very nice features
including being future-proof: I was able to easily convert the HXFER
files to .CO without knowing anything about HXFER by simply using a
standard hex-to-binary tool (|xxd -r -p|) and truncating the file the
LEN specified in the header. Will a future computer-archaeologist curse
us for inventing a new format? I think not since, unlike HXFER, our
method has the benefit of including the decoder with the data, but it’s
worth considering.
2. Adjust ROM calls per platform. This might require another
method altogether. Maybe there would need to be a library of calls
and some code to identify the call. More ambitious.
Not as crazy as it sounds. I’ve been struck repeatedly by how the
official ROM calls have persisted from the Kyotronic 85 to its kin,
usually with just the address changed. The main incompatibility that I
recall so far is the RS232 calls for the NEC PC-8201A which have a very
peculiar initialization string. That said, it is definitely more
ambitious and probably shouldn’t even be part of the encoding. At least
not at first. I’d want to see a proof of concept that’s able to convert
a non-trivial program (from assembly source code) to .CO files for more
than one machine. I see Model 100 / Tandy 200 conversion as the most
likely to be successful. I believe Kyotronic to Olivetti M10 should be
fairly similar, too.
I’d also been considering using |!| followed by digits to represent Run
Length Encoding (RLE) of the next non-digit character, similar to the
Sixel image protocol. However, that’s probably not a terribly useful
thing to do given that very few .CO files have the same byte repeated
more than three times.
I’m not quite ready to work on any of these extensions yet as I’d like
to finish up my loader (100% functional, proper sanity checks,
documentation, and testing). Please bring up relocating in the encoding
and automagic translation again as they may get more traction in my
brain later.
Also have we landed on an agreed encoding?
Until we find the next hiccup, I think yes. The main features I believe are:
* All characters represent themselves except |!| and a character
preceded by |!|.
* A character preceded by |!| will have its high-bit flipped and the
|!| discarded.
* All Model T computers share a set of characters which should be encoded:
o A literal |!| (bang, 33) is encoded as characters 33 and 161,
which display as |!à| on a Model 100. ⁰
o |"| (double quote, 34) confuses BASIC when included in DATA
statements.
o |^Z| (control-Z, 26) signals the End Of File and cannot be
received over the serial port to a text file.¹
o All control characters (anything less than |Space|, 32) are
removed by the BASIC tokenizer with the exception of |Tab|.²
o |DEL| (delete, 127) is removed when a file or program is opened
with EDIT.
⁰ Other Kyotronic kin may not show anything for “high ASCII”, but the
characters are preserved.
¹ In fact, after a ^Z, the rest of the file will be missing!
² For simplicity, Brian’s co2ba also encodes |Tab| and |Space|, but
that’s still valid by this encoding.
Actually I stopped doing that since it's silly.
The code is nothing at all, and actually more generic without using
direct selection logic like "is the value less than 34".
It's far more useful to have it ask "is the value in the unsafe list ?"
and then the list can change any time for any reason in any way without
changing the user needing to change the code.
So the unsafe list, a special option that's just a shorthand for adding
127 to the unsafe list, the choice of "!", the xor value, the length of
the output lines, the line numbering, are all actually configurable
simply because there is no reason not to.
Like if I hadn't known about the special case of 127 because it never
bit me personally, and so I never added the EDITSAFE option just for
that, the script would still work for someone else who discovered they
needed that. They could just use the UNSAFE option to customize the list
of unsafe values.
One thing that's fixed right now is the fact that the shift operation is
xor and not whatever you want like the way I originally just had +/-64
But for instance, xor 64 actually works too, and has the property that
127 doesn't become 255, nor would 255 become 127. I guess that doesn't
matter but I just always thought 255 was sometimes a problem too, maybe
in other venues that need to handle the file. I guess the utility of
being able to change the xor value would be if you need to encode
something that would end up making another value you can't have, so you
can shift the whole mess around and find something that works for your
particular case. like idk some old system 7 mac software that for
whatever reason doesn't like some byte, or maybe doesn't like some byte
*combo* like it interprets it. Like in fact bash by default interprets !
itself when you type it in manually or paste into a command line. So If
you were going to be dealing with encoded data on the command line, that
is a case where you might wish it were not any of the meaningful bash
characters.
I keep changing my mind about some things. Originally I was including
the ! and the 64 or 128 in the header line so that the header says how
to decode the payload along with it's size and exe address and name.
Then made those not configurable and removed them from the header. But
they are configurable again and so they should probably be considered
metadata and be defined in the header. (the header data line)
I do consider the user free to generate a loader with whatever encoding
options they may want for whatever their own reasons may be. So as far
as I'm concerned there doesn't really need to be a religious consensus
for the payload format, because the payload and it's decoder are
generated together. It's good enough for me that the generator is itself
a published and easily available thing after the fact, unlike so many of
the exotic loaders that currently exist. Sure they technically include
the code to decode, but the best ones include a binary blob that is
essentially inscrutable. It exists and is physically possible to trace
it's execution with a 80c85 datasheet and m100 reference manual...
I also now added a METHOD option that generates different encoding types.
So you can generate a loader that uses the quasi hex pair method James
Yio and Kurt McCullum use.
And the even cruder simple direct csv ints.
There is not really any need I can see to use them for real, but it's
interesting to generate the exact same payload the different ways where
that is the only difference, just to compare them.
Basically they all end up taking the exact same full run time, because
the faster ones are exactly offset by there increased transfer time
(though that may not be true if you're not transferring as slow as I am)
But of course the smaller storage size is an important difference so the
new way is definitely the way to go.
METHOD A (default) = What do we call it? I was going to call it Adolph
encoding, or "the quite ok encoding" (a joke referencing an image
compression that came out a few years ago), but actually it's the result
of all 3 of us at this point. We aren't doing exactly what Steve was and
we each changed something. I wrote Adolph/B9/White in a comment in the
script just to have some sort of label and get both your names in there.
B = Identical data as A but the loader code is implemented a different way.
H = hex pairs - but like James Yi and Kurt McCullum use, where the
alphabet is like a-p and treated as "byte-97*256 + nextbyte-97".
I = The good old practically no code required ints. Just read & poke in
a loop once for every byte. The loop iterator is just the address
itself. The entire loop is just the tail end of the first and only line
of code! But each byte takes 2 to 4 bytes...
And so at some point I hope to be able to add other methods like a fancy
machine language option based on whatever you develop, and that should
probably also advance over time to include yet more options to use
actual compression.
I had thought about RLE too. I think it's been shown to more than pay
for itself even though it's so simple and doesn't usually gain much.
It's like low hanging fruit so I was probably going to do it sometime
just to see how it works out.
Basically it's cool to have a loader generator instead of a loader, so
you can spit out variations and test & compare all kinds of things
easily. It's more work to get from zero to the first level of indirectly
getting loader generator produce the final output you want (vs say just
handcrafting a loader for one file directly)
but once you do have a generic loader generator working even just for
the simple easy cases where the binary isn't weird, it's easy to add
tiny incremental additions to that and after a while you really have
something cool without ever having to invest in some big project that
wouldn't seem worth it.
Note that all high-ASCII codes are preserved and do not need to be escaped.
The main unknown about this encoding is what to refer to it as. We can
keep calling it Stephen Adolph’s encoding, since you suggested it, but I
hah! what was I just saying... hehe
know I personally wouldn’t like having any idea named after me unless I
was done coming up with new ideas. I have a feeling you’re not done yet,
so in my head I’ve been calling it !-encoding (pronounced “bang
encoding”). What do you think it should be called?
bangcode/!code works for me even though my generator will spit out
anything you want in place of !. It needs to be short, so describing all
the actual variable details is out. We could be bold and claim t-code
for the rest of time. The encoding needed for Model T's, because it's
the list of which are the illegal bytes that's different. Or A-code.
WHat encoding is that? Oh it's just a code.
Maybe it could be the first ever legitimate reason to say "exscape"
since it's exclamation escape coding.
I just realized... when using ! in particular, the coding is also quite
literal. As you read the data, !a is in fact literally not a.
I was thinking of using space as the prefix for relocatable
placeholders. It sounds bad but one it's one of the other free values
below 34 and better than tab, and I think I kind of like the idea that
it would make all the relocate objects stand out. tab would actually
maybe be even better for that, not just because they stand out even
more, but because it would mean all the normal tabs would get encoded
and collapsed.
Then maybe space could be for rle?
--
bkw