Hi! I have some question about the the scheduling of instructions since the pdf is quite sparse. I have also read this thread: http://lists.laptop.org/pipermail/devel/2006-August/001232.html
1. As I understand the Geode can only schedule 1 ops/clock even if it has 2 execution units. Is it correct? 2. Does the IU (Integer Unit) perform integer MUL/DIV? 3. Is the IU pipelined? So the clock numbers are latencies or absolute times? 4. Is the FPU pipelined? So the clock numbers are latencies or absolute times? 5. There is the following text in page 656: "The CPU is functionally divided into the Floating Point Unit (FPU) unit and the Integer Unit. The FPU has been extended to process MMX, AMD 3DNow!, and floating point instructions in parallel with the Integer Unit. When the Integer Unit detects an MMX instruction, the instruction is passed to the FPU or execution. The Integer Unit continues to execute instructions while the FPU executes the MMX instruction. If another MMX instruction is encountered, the second MMX instruction is placed in the MMX queue. Up to six MMX instructions can be queued. When the Integer Unit detects a floating point instruction without memory operands, after two clock cycles the instruction passes to the FPU for execution. The Integer Unit continues to execute instructions while the FPU executes the floating point instruction. If another FPU instruction is encountered, the second FPU instruction is placed in the FPU queue. Up to four FPU instructions can be queued. In the event of an FPU exception, while other FPU instructions are queued, the state of the CPU is saved to ensure recovery." What is this 2 clock cycle stall? Is it the time while the op passes though unused stages? What about 3DNow instructions? 6. PFRCP has a note 1 -> "1) These instructions must wait for the FPU pipeline to flush. Cycle count depends on what instructions are in the pipeline." PFRCPV does not have this same note. Is it a bug? PFRCPIT1 has but PFRSQRT does not have it. What is the reason? Or does it depend on whether the op is implemented via microcode? 7. What do Way0, Way1, Way2, or Way3 mean on page 617? "3) Any needed memory operands are in the cache in the last accessed way (i.e., Way0, Way1, Way2, or Way3). Add two clocks if not in last accessed way." 8. On page 617: "8) For non-cached memory accesses, add several clocks. Cache miss accesses are approximately an additional 25 clocks, the exact number depends upon the cycle/operation running." Does it mean that a cache miss stalls the execution unit for ~25 clocks? Is it main RAM? If so how many clocks does it take reading from L1 and how many from L2? In this case will it stall the load/store unit or can the not stalled execution unit access memory? 9. Do you have some hard numbers of sequential/random 8byte read/write speed on the OLPC machine? (So with the exact RAM and LX 800 processor which is used in the OLPC machine.) MOVNTQ sequential speed? 10. If MASKMOVQ skips some bytes does it mean that it will not read those skipped bytes from RAM? Or will it have the same speed as MOVNTQ? 11. It is referring to the FPU or just the FP ops (FMUL/FSIN)? http://mailman.laptop.org/pipermail/devel/2006-August/001323.html "This is correct. Two FP instructions cannot be issued on subsequent cycles." "The Geode pipeline is very simple. We're not superscalar in any way, shape or form." <- That is why I asked 4 and 5... Note that I do not have an OLPC machine and will not ask for one before I have some working code. But I need this info for running the inner loops on the paper processor... Thanks in advance! _______________________________________________ Devel mailing list [email protected] http://lists.laptop.org/listinfo/devel
