Wow, I haven't tested this enough to push it up to github yet (I haven't
even tried loading the result on a 100 yet to make sure it actually
decodes correctly) but I think I just reduced the output .DO size from
5305 to 4378 just by applying a static offset to all bytes before encoding.
Almost a whole 1K out of 5 just from that!
I read that yEnc adds 42 to every byte simply so that nuls don't have to
be encoded, because strings of nuls are common.
Sounds easy so I tried it.
I did not expect it to make THAT much difference.
I actually tried a few different transforms. I don't know what yenc
actually does but the wikipedia says "adds 42" so in our case that will
have to mean wrapping the results >255 back to 0. I have the option to
do that but xor needs less BASIC code to reverse it and it seems to make
an even smaller file.
I don't know if 42 is a specially selected value that's statistically
optimal across most files, it's just what yEnc does so I tried it, but
it's trivial to just use any value you want. It can be anything but it
will need to be anything above 34 to move NUL to where it won't need to
be encoded. I mean I guess you could use 9 or 32 to get that too. Who
knows how much the other low values matter. No matter what offset you
apply to move some bytes so that they don't need to be encoded, other
bytes will move in that now need to be encoded that didn't before.
*something* will always still need to be encoded of course, it's just a
game of statistics which bytes appear more often than others.
So I just tried a couple different things more or less arbitrarily:
rotate by +42 (simply because that's what yEnc does), rotate by +64,
xor42, xor64.
Here the ROT variable below does:
for ((i=0;i<LEN;i++)) { ((d[i]=(d[i]+ROT)%256)) ; }
UNROT="B=(B+$((256-ROT)))MOD256:" (untested BASIC)
and XROT does:
for ((i=0;i<LEN;i++)) { ((d[i]=d[i]^XROT)) ; }
UNROT="B=BXOR$XROT:"
If ROT or XROT was say 64, UNROT will actually be "B=(B+192)MOD256:" or
"B=BXOR64:" and that gets inserted into the BASIC right after the byte
value B is finalized but not yet used for anything.
reference:
bkw@fw:~/src/dl2/tmp$ ../co2ba.sh ALTERN.CO call >TEST.DO ;ls -l TEST.DO
-rw-rw-r-- 1 bkw bkw 5305 Mar 9 13:12 TEST.DO
rotate by +42 :
bkw@fw:~/src/dl2/tmp$ ROT=42 ../co2ba.sh ALTERN.CO call >TEST.DO ;ls -l
TEST.DO
-rw-rw-r-- 1 bkw bkw 4428 Mar 9 13:12 TEST.DO
rotate by +64 :
bkw@fw:~/src/dl2/tmp$ ROT=64 ../co2ba.sh ALTERN.CO call >TEST.DO ;ls -l
TEST.DO
-rw-rw-r-- 1 bkw bkw 4415 Mar 9 13:12 TEST.DO
xor 42 :
bkw@fw:~/src/dl2/tmp$ XROT=42 ../co2ba.sh ALTERN.CO call >TEST.DO ;ls -l
TEST.DO
-rw-rw-r-- 1 bkw bkw 4415 Mar 9 13:13 TEST.DO
xor 64 :
bkw@fw:~/src/dl2/tmp$ XROT=64 ../co2ba.sh ALTERN.CO call >TEST.DO ;ls -l
TEST.DO
-rw-rw-r-- 1 bkw bkw 4378 Mar 9 13:13 TEST.DO
Obviously the amount of gain, and what kind of offset value gains the
most will depend on just what the input file happens to contain.
Different values must be shifting other values besides NUL that might be
particularly common in some file. If some file just happens to have say
a lot of 12s in it, then an offset value that moves the 12s above 34
will make a big difference on that particular file.
And since unlike yEnc or any other generic standard, we ship the decoder
bespoke with the payload, each payload can have it's own optimal
parameters for things like that.
And this sill isn't even doing rle yet.
--
bkw