Tony Bedford wrote:
>I have a loop that needs to clear a 4096 byte area.
>I would like it to be as fast as possible, but this is
>not critical.

Even if you do not use a LDIR like suggested already, your routine is 
pretty inefficient. I'll give you a faster version, so you can learn a few 
simple tricks.
I'll comment your original listing:

         ; need to clear data area
         ; data area is 4096 bytes
         ; 4096 = (255*16)+16            ;; 256*16 is better, see following

         ld hl,DATA_AREA
         ld c,16

@@start_loop1:

         ld b,0                  ; making b 0 will loop 256 times
                                 ; this is because b is decreased BEFORE it's
                                 ; checked for 0.

         xor     a               ; use A register to store 0 value, this will
                                 ; increase effeciency by 3*256 cycles

         ; by the way, you can do 'xor a' and then 'ld b,a' to save
         ; an additional 3 cycles.

@@clear_byte:
         ld (hl),a               ; so use A here, not 0.
         inc hl
         djnz @@clear_byte

         dec c
         jp nz,@@start_loop1

The rest of the routine was unnecessary. (since we already looped 256 times)

>Any suggestions? Does anyone know of any good docs on Z80 code
>optimisation?

There's just a few small tricks and the rest is 'common sense' (which 
highly depends on ones natural talent for logic).

In stead of 'LD A,0' use 'XOR A'.

In stead of 'CP 0' use 'OR A'. If you have a list of CP's and don't need 
the value tested you can use:
or a            ; in stead of CP 0
jr z, its_0
dec a           ; in stead of CP 1
jr z, its_1
dec a           ; in stead of CP 2
jr z, its_2
etc...

If you need the sign-flag of A, don't do a PUSH AF/POP HL or whatever 
construct. You can just use 'OR A' or 'AND A'. I usually use 'AND A' to 
distinguish between 'CP 0' and 'evaluate flags' use.

For multiplication and division try to use bitshifts and ADDs. General 
purpose multiply/divide routines are often unnecessarily slow.
a * 40 for instance:
ld l,a
ld h,0
add hl,hl ; hl = a * 2
add hl,hl ; hl = a * 4
add hl,hl ; hl = a * 8
ld e,l
ld d,h
add hl,hl ; hl = a * 16
add hl,hl ; hl = a * 32
add hl,de ; hl = a * 32 + a * 8 = a * 40

Don't forget the shadow set of registers. EX AF,AF' to temporarily save the 
A register is much (MUCH!) faster than PUSH/POP AF.
Also EX DE,HL is often very valuable:
; shift left 32 bit value HLDE
ex de,hl
add hl,hl
ex de,hl
adc hl,hl

Well, I probably forgot a few other tricks, maybe I'll post them if I 
remember them.

Greetz,
         Patriek

--
For info, see http://www.stack.nl/~wynke/MSX/listinfo.html

Reply via email to