On 3/10/26 22:31, B 9 wrote:
Nifty! I had been thinking about how to encode NULLs in a single byte since they are so much more common, but hadn’t known about the yEnc offset. Doing some histogram analysis on the most easily available batch <https://github.com/hackerb9/co2do/tree/histogram/histogram/kurtdekker/ co> of .CO files, it does not look like 42 is an optimal offset. I like the idea of a bespoke offset! Here’s a sample of a program which can determine that for you for a single .CO file or a whole directory of them: histco.py <https://github.com/hackerb9/co2do/blob/histogram/ histogram/histco.py>:

|$ ./histogram/histco.py testfiles/ALTERN.CO <http://ALTERN.CO> Unrotated: 4758 bytes. Can save 956 bytes (20.09%) Rotation +143 => 3802 bytes. Rotation of +42 would save 871 bytes (18.31%) |

|$ cd histogram/kurtdekker/co/ $ ../../histco.py Unrotated: 111237 bytes. Can save 18353 bytes (16.50%) Rotation +136 => 92884 bytes. Rotation of +42 would save 6696 bytes (6.02%) |

Super cool.


That's curious that you arrived at +143 for that file.

I just added a brute force scanner which tries all possible values 0-255 using xor and using rot, and for the same file I get +122 (rot122)


$ time co2ba ALTERN.CO call |wc -c
4401

real    0m0.097s
user    0m0.070s
sys     0m0.031s

$ time XA=best co2ba ALTERN.CO call |wc -c
trying all possible XA values...
XA=+122
4354

real    0m5.571s
user    0m5.541s
sys     0m0.032s



For the XA variable I'm using a convention that ^val means xor by val, and +val means rotate by val

Apparently it makes a difference because the best possible rotate did slightly better than the best possible xor, when all 256 possible values were tried. So it's not just a case that for every xor result there is some equal rot result just at some other location.

But I wonder if my scanner logic is bad because we both should have gotten the same value. I presume we both did the same rotate:

(byte+n)%256

I'm not doing anything efficient, it's super brute force.
I start with t=LEN s=LEN*2 and then walk the binary and t++ every time I hit any of the unsafe values using xorN. Then at the end if t<s then s=t. t is the total bytes when using xorN, s is the smallest t seen so far. Repeat for rotN. Repeat both for N=0-255.

I am now also including 127 by default like you are. It's so useful it's basically user-hostile not to, and it doesn't cost hardly anything. Especially after this gain from the rotate. So there should be no difference from that.


with some debug echos added

$ XA=best ../co2ba.sh ALTERN.CO call |wc -c
trying all possible XA values...
^0 -> 4752
+0 -> 4752
^1 -> 4718
+1 -> 5552
^2 -> 4723
+2 -> 5514
^3 -> 4782
+3 -> 5531
^4 -> 4697
+4 -> 5515
...
^120 -> 3903
+120 -> 3815
^121 -> 3892
+121 -> 3814
^122 -> 3890
+122 -> 3806
^123 -> 3886
+123 -> 3807
...
+140 -> 3877
^141 -> 3839
+141 -> 3856
^142 -> 3837
+142 -> 3883
^143 -> 3834
+143 -> 3880
^144 -> 3846
+144 -> 3887
^145 -> 3840
+145 -> 3892
...
^248 -> 4704
+248 -> 3942
^249 -> 4697
+249 -> 3939
^250 -> 4705
+250 -> 3946
^251 -> 4702
+251 -> 3936
^252 -> 4640
+252 -> 3911
^253 -> 4702
+253 -> 3941
^254 -> 4699
+254 -> 3968
XA=+122  ->  3806 bytes
4354

And there is in fact 3806 bytes of payload after I manually remove the basic and linebreaks, so at least I'm counting right. The loop that generates the payload is a totally separate thing later, and the two things agree on the total at least. And the file works, passes checksum and runs etc.


Oh yeah I switched to a rolling xor checksum too.
It only needs ints in basic and I think actually catches more errors.
Still kind of swiss cheese compared to real crc algorithms but those are expensive and this is cheap.

Another idea I did before that was just add the total byte count to the sum, and have the loader add an extra +1 per poke. That way even if the data was all 0's, as long as the +1's were actually added one at a time along the way the sum would break if any bytes were lost, or added for that matter.

That worked, but I think the xor is even cheaper and still improves on the simple sum. I think the main weakness is with strings of repeating bytes, any value not just 0, they way xoring the same 2 values exactly reverses itself, means that you can drop a byte and catch it, but if you drop 2 of the same byte in a row, you wouldn't know it. Those 2 bytes would have had no effect on the sum when they were both present.

Maybe I should go back to adding the total length to the final comparison after all. That should catch even more & freakier errors and still essentially free. I can even still keep the ints. The max possible file size (29.6k) plus the max possible checksum (255) is still way short of max int (32k).


Also I'm calling it !yenc. Because it's not yenc. And yet essentially is so it's descriptive of it's properties both ways.

I started on rle but it's kind of a transporter accident still. It sorta mostly lives...

--
bkw


—b9

P.S. A possible lesson for us on rolling our own encoding: Searching for sample .CO files to test with my histogram program, I found a ZIP file of Kurt Dekker’s games on Bitchin 100 <https://bitchin100.com/m100-oss/ archive.html>. Kurt actually released those in his own DEC format <https://github.com/hackerb9/co2do/blob/histogram/histogram/kurtdekker/ util/FTU.TXT>. I downloaded the link labeled “everything in one BIG zip”, but it did not include any .CO files, so I rolled my own dec2co.sh <https://github.com/hackerb9/co2do/blob/histogram/histogram/kurtdekker/ dec2co.sh> program. Later, when I found that bitchin100 /did/ have the .CO files, merely misfiled. I was rather surprised to see that three of them did not exactly match mine. It seems there’s a bug in the tool Kurt released (FTU.BAS <https://github.com/hackerb9/co2do/blob/ histogram/histogram/kurtdekker/util/FTU.BAS>) which occasionally causes it to emit bytes beyond the length specified in the .CO file header.


On Mon, Mar 9, 2026 at 3:24 PM Brian K. White <[email protected] <mailto:[email protected]>> wrote:

    On 3/9/26 14:03, Brian K. White wrote:
     > Wow, I haven't tested this enough to push it up to github yet (I
    haven't
     > even tried loading the result on a 100 yet to make sure it actually
     > decodes correctly) but I think I just reduced the output .DO size
    from
     > 5305 to 4378 just by applying a static offset to all bytes before
    encoding.
     >
     > Almost a whole 1K out of 5 just from that!

    Ran ok. The decode time stayed the same, but the transfer time went
    down
    and the ram used went down of course.

-- bkw



--
bkw

Reply via email to