On Tue, 3 Sep 1996, Jos Hulzink wrote:

> You are right. The PC came, and the art of programming was lost. Where
are
> the real bitsniffers, the ones who try to use every clockcycle.... See
the
> MSX-BIOS structure and the MS-WINDOWS structure.. They both are Microsoft
> products :) 

There are programmers on the PC that do know how to program (look at the 
PC demoscene), but they don't work for Microsoft. Remember that Microsoft 
is not there to make beautiful programs but to make money. Spending a lot 
of time to optimize the code will lower their profit and therefore they 
don't do it. In the time that every byte and clock cycle counted (1984) 
the people at Microsoft could do it, just look at some of the BIOS and 
disk ROM code. Not all of it is that good, but some parts are really 
creative. One example from the diskROM code: (I don't know if it is 
Microsoft or ASCII to thank) 

label1: OR A
        JR C            ; offset is not included
label2: SCF
        <routine>

The 'OR A' resets the carry flag as a side effect. Then the 'JR C' doesn't 
jump. In that way the 'SCF' (Set Carry Flag) is skipped because the Z80 
thinks it is the offset of the 'JR C' instruction. The normal way to do 
this is longer and slower:

label1: SCF
        CCF
        JR label3
label2: SCF
label3: <routine>

> Who can write the shortest multiply and division routines ? Let's have a
> contest... The one who can divide (16 bit / 8 bit) and multiply ( 16 *
16)
> with the least clockcycles, wins two free entrances for THE MSX FAIR OF
THE
> WORLD (Tilburg 97). Use of R800 instructions is not allowed :)

One problem: there are routines (like my 8 * 8 bits multiply) that don't 
multiply all numbers equally fast. Which time will you take? Best case 
isn't a good choice because you could make a routine that calculates 0*0 
in a few clockticks and all other multiplications pretty slow. Worst case 
is a better choice, but still not good because it doesn't give a good 
view of the performance of the routine in an actual program. To make an 
average of all numbers is a lot of work because there are 2^32 different 
multiplications to be carried out. That's would take about a day to 
calculate on a 3.5 MHz MSX. Maybe you can make a file with about 10.000 
multiplications that are randomly chosen. And check all the results of 
the routines too, because it has to be correct. Testing just a few values 
won't work because in special cases (for example n*0) it might not work 
if not properly written.

And what about other rules:
- Which registers may be used? I think that the NOP solution which uses 
  the stackpointer shouldn't be allowed without a DI and EI (which 
  consume some clockticks) or maybe not at all, because (as Alex pointed 
  out) an interrupt during the divide routine could hang your MSX.
- Are lookup tables allowed (and how big)? I can make a REALLY fast routine

  if I am allowed to use a 16GB lookup table... And how much empty memory 
  is the routine allowed to use?
- The 16*16 bits multiply, how many bits does the result have? Normally 
  that would be 32bits, but for example the NOP routine gives a 16bit 
  result. That is not a problem because usually you don't need a 32bit 
  result in a real program but there is the fact that a routine that 
  produces a 32bit result will be slower than a routine that produces a 
  16bit result.
- You probably meant unsigned multiplication and division, but never 
  stated that explicitly.
- How big may the code be? As someone already wrote, it is faster to 
  write the same code 16 times than to make a loop that is executed 16 
  times, and it saves a register as well.

If you are going to have a contest, better make some rules first!

> Send your solution to : [EMAIL PROTECTED]

And to the mailing list too please, because a lot of people can learn 
from your solution.

Bye,
                Maarten





Reply via email to