On Tue, 3 Sep 1996, Jos Hulzink wrote:
> You are right. The PC came, and the art of programming was lost. Where
are
> the real bitsniffers, the ones who try to use every clockcycle.... See
the
> MSX-BIOS structure and the MS-WINDOWS structure.. They both are Microsoft
> products :)
There are programmers on the PC that do know how to program (look at the
PC demoscene), but they don't work for Microsoft. Remember that Microsoft
is not there to make beautiful programs but to make money. Spending a lot
of time to optimize the code will lower their profit and therefore they
don't do it. In the time that every byte and clock cycle counted (1984)
the people at Microsoft could do it, just look at some of the BIOS and
disk ROM code. Not all of it is that good, but some parts are really
creative. One example from the diskROM code: (I don't know if it is
Microsoft or ASCII to thank)
label1: OR A
JR C ; offset is not included
label2: SCF
<routine>
The 'OR A' resets the carry flag as a side effect. Then the 'JR C' doesn't
jump. In that way the 'SCF' (Set Carry Flag) is skipped because the Z80
thinks it is the offset of the 'JR C' instruction. The normal way to do
this is longer and slower:
label1: SCF
CCF
JR label3
label2: SCF
label3: <routine>
> Who can write the shortest multiply and division routines ? Let's have a
> contest... The one who can divide (16 bit / 8 bit) and multiply ( 16 *
16)
> with the least clockcycles, wins two free entrances for THE MSX FAIR OF
THE
> WORLD (Tilburg 97). Use of R800 instructions is not allowed :)
One problem: there are routines (like my 8 * 8 bits multiply) that don't
multiply all numbers equally fast. Which time will you take? Best case
isn't a good choice because you could make a routine that calculates 0*0
in a few clockticks and all other multiplications pretty slow. Worst case
is a better choice, but still not good because it doesn't give a good
view of the performance of the routine in an actual program. To make an
average of all numbers is a lot of work because there are 2^32 different
multiplications to be carried out. That's would take about a day to
calculate on a 3.5 MHz MSX. Maybe you can make a file with about 10.000
multiplications that are randomly chosen. And check all the results of
the routines too, because it has to be correct. Testing just a few values
won't work because in special cases (for example n*0) it might not work
if not properly written.
And what about other rules:
- Which registers may be used? I think that the NOP solution which uses
the stackpointer shouldn't be allowed without a DI and EI (which
consume some clockticks) or maybe not at all, because (as Alex pointed
out) an interrupt during the divide routine could hang your MSX.
- Are lookup tables allowed (and how big)? I can make a REALLY fast routine
if I am allowed to use a 16GB lookup table... And how much empty memory
is the routine allowed to use?
- The 16*16 bits multiply, how many bits does the result have? Normally
that would be 32bits, but for example the NOP routine gives a 16bit
result. That is not a problem because usually you don't need a 32bit
result in a real program but there is the fact that a routine that
produces a 32bit result will be slower than a routine that produces a
16bit result.
- You probably meant unsigned multiplication and division, but never
stated that explicitly.
- How big may the code be? As someone already wrote, it is faster to
write the same code 16 times than to make a loop that is executed 16
times, and it saves a register as well.
If you are going to have a contest, better make some rules first!
> Send your solution to : [EMAIL PROTECTED]
And to the mailing list too please, because a lot of people can learn
from your solution.
Bye,
Maarten