On 3/10/26 22:31, B 9 wrote:
Nifty! I had been thinking about how to encode NULLs in a single byte
since they are so much more common, but hadn’t known about the yEnc
offset. Doing some histogram analysis on the most easily available batch
<https://github.com/hackerb9/co2do/tree/histogram/histogram/kurtdekker/
co> of .CO files, it does not look like 42 is an optimal offset. I like
the idea of a bespoke offset! Here’s a sample of a program which can
determine that for you for a single .CO file or a whole directory of
them: histco.py <https://github.com/hackerb9/co2do/blob/histogram/
histogram/histco.py>:
|$ ./histogram/histco.py testfiles/ALTERN.CO <http://ALTERN.CO>
Unrotated: 4758 bytes. Can save 956 bytes (20.09%) Rotation +143 => 3802
bytes. Rotation of +42 would save 871 bytes (18.31%) |
|$ cd histogram/kurtdekker/co/ $ ../../histco.py Unrotated: 111237
bytes. Can save 18353 bytes (16.50%) Rotation +136 => 92884 bytes.
Rotation of +42 would save 6696 bytes (6.02%) |
Super cool.
That's curious that you arrived at +143 for that file.
I just added a brute force scanner which tries all possible values 0-255
using xor and using rot, and for the same file I get +122 (rot122)
$ time co2ba ALTERN.CO call |wc -c
4401
real 0m0.097s
user 0m0.070s
sys 0m0.031s
$ time XA=best co2ba ALTERN.CO call |wc -c
trying all possible XA values...
XA=+122
4354
real 0m5.571s
user 0m5.541s
sys 0m0.032s
For the XA variable I'm using a convention that ^val means xor by val,
and +val means rotate by val
Apparently it makes a difference because the best possible rotate did
slightly better than the best possible xor, when all 256 possible values
were tried. So it's not just a case that for every xor result there is
some equal rot result just at some other location.
But I wonder if my scanner logic is bad because we both should have
gotten the same value. I presume we both did the same rotate:
(byte+n)%256
I'm not doing anything efficient, it's super brute force.
I start with t=LEN s=LEN*2 and then walk the binary and t++ every time I
hit any of the unsafe values using xorN. Then at the end if t<s then
s=t. t is the total bytes when using xorN, s is the smallest t seen so
far. Repeat for rotN. Repeat both for N=0-255.
I am now also including 127 by default like you are. It's so useful it's
basically user-hostile not to, and it doesn't cost hardly anything.
Especially after this gain from the rotate. So there should be no
difference from that.
with some debug echos added
$ XA=best ../co2ba.sh ALTERN.CO call |wc -c
trying all possible XA values...
^0 -> 4752
+0 -> 4752
^1 -> 4718
+1 -> 5552
^2 -> 4723
+2 -> 5514
^3 -> 4782
+3 -> 5531
^4 -> 4697
+4 -> 5515
...
^120 -> 3903
+120 -> 3815
^121 -> 3892
+121 -> 3814
^122 -> 3890
+122 -> 3806
^123 -> 3886
+123 -> 3807
...
+140 -> 3877
^141 -> 3839
+141 -> 3856
^142 -> 3837
+142 -> 3883
^143 -> 3834
+143 -> 3880
^144 -> 3846
+144 -> 3887
^145 -> 3840
+145 -> 3892
...
^248 -> 4704
+248 -> 3942
^249 -> 4697
+249 -> 3939
^250 -> 4705
+250 -> 3946
^251 -> 4702
+251 -> 3936
^252 -> 4640
+252 -> 3911
^253 -> 4702
+253 -> 3941
^254 -> 4699
+254 -> 3968
XA=+122 -> 3806 bytes
4354
And there is in fact 3806 bytes of payload after I manually remove the
basic and linebreaks, so at least I'm counting right. The loop that
generates the payload is a totally separate thing later, and the two
things agree on the total at least. And the file works, passes checksum
and runs etc.
Oh yeah I switched to a rolling xor checksum too.
It only needs ints in basic and I think actually catches more errors.
Still kind of swiss cheese compared to real crc algorithms but those are
expensive and this is cheap.
Another idea I did before that was just add the total byte count to the
sum, and have the loader add an extra +1 per poke. That way even if the
data was all 0's, as long as the +1's were actually added one at a time
along the way the sum would break if any bytes were lost, or added for
that matter.
That worked, but I think the xor is even cheaper and still improves on
the simple sum. I think the main weakness is with strings of repeating
bytes, any value not just 0, they way xoring the same 2 values exactly
reverses itself, means that you can drop a byte and catch it, but if you
drop 2 of the same byte in a row, you wouldn't know it. Those 2 bytes
would have had no effect on the sum when they were both present.
Maybe I should go back to adding the total length to the final
comparison after all. That should catch even more & freakier errors and
still essentially free. I can even still keep the ints. The max possible
file size (29.6k) plus the max possible checksum (255) is still way
short of max int (32k).
Also I'm calling it !yenc. Because it's not yenc. And yet essentially is
so it's descriptive of it's properties both ways.
I started on rle but it's kind of a transporter accident still. It sorta
mostly lives...
--
bkw
—b9
P.S. A possible lesson for us on rolling our own encoding: Searching for
sample .CO files to test with my histogram program, I found a ZIP file
of Kurt Dekker’s games on Bitchin 100 <https://bitchin100.com/m100-oss/
archive.html>. Kurt actually released those in his own DEC format
<https://github.com/hackerb9/co2do/blob/histogram/histogram/kurtdekker/
util/FTU.TXT>. I downloaded the link labeled “everything in one BIG
zip”, but it did not include any .CO files, so I rolled my own dec2co.sh
<https://github.com/hackerb9/co2do/blob/histogram/histogram/kurtdekker/
dec2co.sh> program. Later, when I found that bitchin100 /did/ have
the .CO files, merely misfiled. I was rather surprised to see that three
of them did not exactly match mine. It seems there’s a bug in the tool
Kurt released (FTU.BAS <https://github.com/hackerb9/co2do/blob/
histogram/histogram/kurtdekker/util/FTU.BAS>) which occasionally causes
it to emit bytes beyond the length specified in the .CO file header.
On Mon, Mar 9, 2026 at 3:24 PM Brian K. White <[email protected]
<mailto:[email protected]>> wrote:
On 3/9/26 14:03, Brian K. White wrote:
> Wow, I haven't tested this enough to push it up to github yet (I
haven't
> even tried loading the result on a 100 yet to make sure it actually
> decodes correctly) but I think I just reduced the output .DO size
from
> 5305 to 4378 just by applying a static offset to all bytes before
encoding.
>
> Almost a whole 1K out of 5 just from that!
Ran ok. The decode time stayed the same, but the transfer time went
down
and the ram used went down of course.
--
bkw
--
bkw