Re: 15 computer science collegians looking for a project

NoiseEHC Wed, 30 Apr 2008 01:18:54 -0700

On 29/04/08 17:41 +0200, NoiseEHC wrote:
On this page
http://wiki.laptop.org/go/Geode_LX
I have named some instructions as "Synchronized ops" (in the MMXsection). Are those real or did I mismeasured something?
That section is very difficult to understand.  I'm not sure which
operations you have invented this name for.

As you probably have already noticed I am not a native English speaker(and neither learned advanced English in school, just picked it up).What I wanted to write in that section, every MMX op, whosesource/destination operand is an integer register (and not a MOV), willconsume absolutely different clock cycles than 2 (2 is listed for almostevery MMX op in the databook, at least in my version). Is it real?

If those arereal then would somebody from AMD just go through the databook and fixthe instruction clock cycle numbers? Because in that case it is surethat they do not match reality and clearly I have better things to dothan measuring clock cycles.
Clearly you must have some basis for assuming that the numbers are
wrong, so you must have done some measurement.  I consulted the
secret documentation that you claim I am withholding from you,and the timings there are the same as in the datasheet. I believe that
you are correct in that these are the clock counts for the instruction to
go through the FPU and don't include the stall time for the pipeline
to clear up.

There is a "Test results" section in that page. The first two test wereconducted via email. I have emailed to this list test programs and therewere people who run them and emailed back the result. Especially thefirst test has some stupid bugs because I wrote them essentially blind.The third one is the result of my session logged into a physicalmachine. It can be that only this "stall time" is missing from thedatabook but the fact is that I as a programmer am not interested in howmany clock cycles does the FPU take to execute some internal operation(which seems the databook to list) but I would like to know the realtime consumed.

I am not a silicon designer, so I'm not the final word on if they are
correct or not, but at least that should prove that there isn't a
massive marketing conspiracy to hide the details of the processor
from our customers.  If they are lying to you, they are lying to me,
and they're not lying to me.

This conspiracy thing was not serious, I have used a smiley at the end.However from my perspective there is no difference if there is someconspiracy or if there is not. In fact what I think is either that I ammistaken and made some errors measuring this or the technical writermade mistakes years ago and nobody cared to fix it.

Also the legend is clearly wrong in severalcases so probably that would need checking too (like on page 668 note 4talks about 3DNOW ops in the table about FP ops).
That is an mistake - I have let the technical writer know about it.

Thanks!
Another error:
On page 631 it talks about this:

Conditional jump taken | Conditional jump not taken. (e.g., "4|1" = fourclocks if jump taken, one clock if jump not taken).

It is never used in the opcode table.

absolutely no info about L2 cache miss penalties or mispredicted jumpsor about the pipeline stages of the FP unit.
I don't have any information about L2 cache miss penalties, but theyare easy to calculate. Please see:
http://homepages.cwi.nl/~manegold/Calibrator/

Could you run on your machine and share the results? Currently I do nothave access to an XO.

I will talk to somebody about documenting the FP unit pipeline.
It does handle 1 instruction per clock from the integer unit.
In practice we know that two floating point instructions back to
back will stall the IU.  I can also tell you that it is optimized
for single precision, so double precision is handled by microcode

and needs to go through the path again.

Thanks!

I would also like to know how many ALU units does the FPU have? I meanFMUL costs 1, PFMUL costs 2. Is it because it only has 1 multiply unitand it executes PFMUL serially? If that is the case, does that mean thatthe 3DNOW support is only compatibility and will not be faster thansimple FP?

See, all I would like to have is enough data that when I look atassembly code I could approximately calculate how many clock cycles willbe consumed. Nothing more and nothing less.


You have nearly all the information you need, and you can collect the
additional information the same way we do, with careful analysis and
measurement.  In fact, Bernie and Vladimir Makarov have done a lot
of work already in this area, resulting in the Geode specific
code for gcc 4.2.0 and glibc.  Perhaps you can work with them to figure
out the finer details of the FPU scheduling.  I'm sure they would
appreciate it.

Jordan

_______________________________________________
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel

Re: 15 computer science collegians looking for a project

Reply via email to