Spent all day figuring out a way to do the inner loop without IF.
Woohoo!

And it's slower. OH WELL!


Based on IF:

0READF:CLEAR2,F:DEFINTA-E:DEFSNGF-K:DEFSTRL-O:READF,A,J,G,N:E=128:M="!":C=0:I=F:H=F+A-1:K=0:D=0:CLS:?"Installing "N" 0%"


1READL:FORC=1TOLEN(L):O=MID$(L,C,1):IFO=MTHEND=E:NEXT:ELSEB=ASC(O)XORD:POKEI,B:D=0:I=I+1:K=K+B:NEXT:?@18,USING"###%";(I-F)*100/A:IFI<=HTHEN1


2IFK<>GTHEN?"Bad Checksum":ELSECALLJ


3DATA59346,3614,59346,454932,"ALTERN"



Based on math:

0READF:CLEAR2,F:DEFINTA-E,O-P:DEFSNGF-K:DEFSTRL-N:READF,A,J,G,N:M="":E=128:O=33:P=0:C=0:I=F:H=F+A-1:K=0:D=0:CLS:?"Installing "N" 0%"


1READL:FORC=1TOLEN(L):B=ASC(MID$(L,C,1)):P=SGN(BXORO):B=BXORE*D:POKEI,B:I=I+P:K=K+B*P:D=PXOR1:NEXT:?@18,USING"###%";(I-F)*100/A:IFI<=HTHEN1


2IFK<>GTHEN?"Bad Checksum":ELSECALLJ


3DATA59346,3614,59346,454932,"ALTERN"



Instead of choosing whether or not to poke, it always pokes.
On the "!" bytes, it pokes a 33,
but it does not increment the current address or the checksum on the ! bytes, and stores the !/not status in a var that survives into the next iteration.

Next byte the previous-byte-was! flag makes it flip the high bit on the current byte. Then since the address wasn't incremented before, it just pokes the new byte to the same address, and this time does increment the address and checksum after poking.

And the point is it always does the same things every time without branching. Just a string of assignments and math. And still all on the same line number.

The old way takes 2:22 and the new way takes 2:57 with everything else identical. Same exact payload after line 3.

--
bkw



On 3/6/26 04:51, B 9 wrote:
It's not 100% complete yet, but I made a two-stage BASIC loader <https://github.com/hackerb9/co2do/blob/main/co2do> that includes a little 8085 routine <https://github.com/hackerb9/co2do/blob/main/ decode.asm> for speed. You just run ./co2do FOO.CO <http://FOO.CO> and it'll create a FOO.DO that can load speedily.

I took John Hogerhuis's advice to put the machine language in a BASIC string and call it from there, which helped a lot with portability as I don't need to have a per machine table of free spaces in RAM. Speed is great on my Tandy 200. I don't have a Model 100, but when I simulate it in Virtual T, it seems to take about 14 seconds to tokenize the BASIC. The time to decode the bytes and load it into RAM is a fraction of a second, so I'd say the overall speed is about 15 seconds. You can try loading the resulting .DO file <https://raw.githubusercontent.com/ hackerb9/co2do/refs/heads/main/testfiles/ALTERN.DO> on a M100 or T102 to see for yourself. (I recommend using it with `RUN "COM:88N1E"`).

My co2do <https://github.com/hackerb9/co2do/blob/main/co2do> program is usable now, but incomplete in a few ways:

  * It does not handle the serial transfer itself, so the program is
    still reliant on the speed of the BASIC LOAD/RUN commands.
  * It doesn't run at all on the NEC PC-8201/8300 machines as I haven't
    implemented VARPTR yet.
  * It also hasn't been size optimized at all yet.
  * It needs better memory error warnings and documentation.
  * It should be shipped as a single file instead of having a separate
    ASM file.

—b9

On Thu, Mar 5, 2026 at 9:21 PM Brian K. White <[email protected] <mailto:[email protected]>> wrote:


    It works all on one line after all!
    It doesn't make the file any shorter, actually a couple bytes larger
    because "ELSE:" is longer than "\r2"
    But it did get about 1 second faster.


    
0READT:CLEAR2,T:DEFINTI,O,C,V,L:DEFSNGA,K,S,T,X,E:DEFSTRB,M,D,N:READT,L,X,K,N,O,M:E=T+L-1:A=T:S=0:C=0:CLS:PRINT"Installing
    "N
    1PRINT@20,CINT((L-(E-A))/
    
L*100)"%":READD:FORI=1TOLEN(D):B=MID$(D,I,1):IFB=MTHENC=O:NEXT:ELSEV=ASC(B)-C:POKEA,V:C=0:A=A+1:S=S+V:NEXT:IFA<=ETHEN1
    2PRINT:IFS<>KTHENPRINT"Bad Checksum":END
    3CALLX
    4DATA59346,3614,59346,454932,"ALTERN.CO <http://ALTERN.CO>",64,"!"


    On 3/4/26 07:11, Brian K. White wrote:
     >
     > I've gotten co2ba.sh about as good as I think I'm going to get it
    for
     > now.
     >
     > It generates a larger block of loader code, which also runs slower,
     > but the file is somewhat smaller and that ends up making the
    total job
     > take about the same time with a 3.6k sample input co file.
     >
     >
     > The sample file used for these tests and comparisons is
     > ALTERN.CO <http://ALTERN.CO> manually reconstituted from
     > https://github.com/LivingM100SIG/Living_M100SIG/blob/main/
    M100SIG/Lib-07-UTILITIES/ALTERN.100 <https://github.com/
    LivingM100SIG/Living_M100SIG/blob/main/M100SIG/Lib-07-UTILITIES/
    ALTERN.100>
     >
     >
     >
     > My previous loader code looks like this:
     > (extra blank lines to help reading after email wraps the long lines):
     >
     > -----------
     > 0CLEAR0,59346:A=59346:S=0:N$="ALTERN.CO <http://
    ALTERN.CO>":CLS:?"Installing "N$" ..";
     >
     >
    
1D$="":READD$:FORI=1TOLEN(D$)STEP2:B=(ASC(MID$(D$,I,1))-97)*16+ASC(MID$(D$,I+1,1))-97:POKEA,B:A=A+1:S=S+B:NEXT:?".";:IFA<62960THEN1
     >
     >
     > 2IFS<>454932THEN?"Bad Checksum":END
     >
     > 3CALL59346
     >
     > 4DATAmndmpfmndbeccklppfolcbim...
     > -----------
     >
     >
     >
     >
     >
     > With the new scheme I'm down to this
     >
     > -----------
     >
    
0READT:CLEAR2,T:DEFINTI,O,C,V,L:DEFSNGA,K,S,T,X,E:DEFSTRB,M,D,N:READT,L,X,K,N,O,M:E=T+L-1:A=T:S=0:C=0:CLS:PRINT"Installing
     > "N
     >
     >
     > 1PRINT@20,CINT((L-(E-A))/
    L*100)"%":READD:FORI=1TOLEN(D):B=MID$(D,I,1):IFB=MTHENC=O:NEXT
     >
     >
     >
     > 2V=ASC(B)-C:POKEA,V:C=0:A=A+1:S=S+V:NEXT:IFA<=ETHEN1
     >
     >
     > 3PRINT:IFS<>KTHENPRINT"Bad Checksum":END
     >
     >
     > 4PRINT"Done. Please type: NEW":SAVEMN,T,E,X
     >
     >
     > 5DATA59346,3614,59346,454932,"ALTERN.CO <http://ALTERN.CO>",64,"!"
     >
     > 6DATA"Í<õÍ1B*¿õë!aŒ!DÍ õ|µÊêç...
     > -----------
     >
     >
     > Part of the size difference is some apples/oranges differences that
     > make it not a direct comparison. The two could be more similar than
     > this if I wanted. Previously I just had the generator write the co
     > header variables directly in the code instead of having a header
    data
     > line, while in the new one I'm doing it all from a data line,
    because
     > I like that the loader code then is self contained & portable. You
     > could copy the loader block and stick it on top of some other paylod
     > and it would work.
     >
     > And another part is I made a real percent-done display on the new
    one
     > because it doesn't cost any run time, just a few more bytes of file
     > size. It only runs once per data line and outside of the inner loop.
     >
     > The defint/defsng etc making line 0 longer also makes it run several
     > seconds faster.
     >
     > I actually have an even slightly shorter version just by using the
     > range syntax for the DEF*
     > DEFINTA-E:DEFSNGF-K:DEFSTRL-O
     > vs
     > DEFINTI,O,C,V,L:DEFSNGA,K,S,T,X,E:DEFSTRB,M,D,N
     > but it makes the code just about unreadable since the letters
    lose all
     > meaning.
     >
     > The notable points:
     >
     > no goto in the inner loop, just next.
     > saved a line and also made it so that the generator script doesn't
     > have any forward references, so it can just increment line numbers
     > without having to hard code like a GOTO3 on line 1 etc.
     >
     > Instead of
     > O=64 C=0 ... IFB=MTHENC=1  ... V=ASC(B)-(O*C)
     > (on every byte set a decode flag to 0 or 1, then multiply the
    encoding
     > offset by the on/off flag to enable/disable the offset)
     >
     > Just
     > O=64 C=0 ... IFB=MTHENC=O  ... V=ASC(B)-C
     > (instead of setting the encode flag to 0 or 1, just set it to 0
    or the
     > actual offset value, then just subtract it directly without the
     > multiplication step. Always subtract, sometimes it's 0, sometimes
    its 64.
     > As far as I can tell, 0, 1, and 64 are all the same int and the same
     > work to process as long as the variables are declared to the same
    type.
     >
     > Already mentioned all variables from data, can-nable loader code etc.
     >
     > If the top address is the very first data value, you can read it,
    use
     > it to clear, and then just read it again to still have it after the
     > clear without wasting much space or cpu, and without needing the
     > generator script to write the value twice in duplicate assignments
     > before & after the clear.
     >
     > Already mentioned the fancy percent-done progress.
     >
     > The generator script has config options so you can change the
    behavior
     > at run-time by env variables.
     >
     > So you can change the starting line number, the line number
    increment,
     > the length of the data lines, the encoding mark character, the
     > encoding offset value.
     >
     > I have the generator script now counting all the bytes in the output
     > line when building data lines and deciding when to start a new line,
     > so now every line fills to the specified max length as much as
     > possible even though the size of the data varies because of the
    varied
     > encoding.
     >
     > All in all, the new way generates a smaller file, but the loader
    code
     > runs slower, and it ends up taking almost exactly the same total
    time
     > to load. The smaller file size is a win though, and the total
    time is
     > actually *slightly* in favor of the new way.
     >
     > The new scheme is conceptually simple but it takes 2 lines of
    code and
     > includes an IF branch where the old way the entire loop is on a
    single
     > line and the the same math ops happen for every byte, no branching.
     >
     > I read that one optimization is to move initialization/setup code to
     > the end instead of the top, and use goto or gosub to jump to it and
     > back, and have your tight loop as close to the top as possible.
     > Something about BASIC searching from the start of the file
    repeatedly?
     > Well I tried that and it made no difference in my run times. I tried
     > both goto and gosub.
     >
     > For now I kept the old script in the repo as co2ba_old.sh since it's
     > output is probably still useful being all pure low ascii
    printable text.
     >
     > Converting the same input:
     > ALTERN.CO <http://ALTERN.CO> 3620 bytes
     >
     > old:
     > 7749 bytes
     > xfer time: 1:05
     > load time: 2:04
     > total: 3:09
     >
     > new:
     > 5334 bytes
     > xfer time: 0:45
     > load time: 2:23
     > total: 3:07
     >
     > https://github.com/bkw777/dl2/blob/master/co2ba.md <https://
    github.com/bkw777/dl2/blob/master/co2ba.md>
     > https://github.com/bkw777/dl2/blob/master/co2ba.sh <https://
    github.com/bkw777/dl2/blob/master/co2ba.sh>
     >
     >
     > Anyway, thanks again for the idea Steve!
     >

-- bkw




--
bkw

Reply via email to