* Guillem Jover | 2010-02-17 21:51:37 [+0100]:

>Yesterday did some superficial research, and I might have
>missunderstood stuff, but isn't the SPE in the E500 the same as the one
>on the Cells?
Nope, tottally different thing: A Cell CPU (the PPU) has (usually) 8
SPEs (and SPE here is an acronym for Synergistic Processing Element)
which is an independent CPU. Each SPE consists basically of a SPU (the
special CPU) and 256KiB of local store. The local store contains the
code and data of a program which is executed by the SPU. Each SPU has
128 generic purpose registers, each register is 128 bit wide. The PPU
and SPU can kick DMA transfers to load/store data to/from the local
store and they can comminicate via mail box messages. The SPE can not
access the DDR-memory in any other way except the DMA transfers I
mentioned. And the SPE is a vector processor not a scalar processor.

In contrast to this the SPE (SPE here is an acronym for Signal
Processing Engine) is capable of single instruction multiple data (SIMD)
and floating point instructions. (As far as I'm aware Freescale launched
their SPE before IBM/Sony/Toshiba launched theirs. It would probably
cause less confusion if ARM or MIPS or anyone else who is not doing
POWER would recycle those three letters :))
Back to the topic: The SPE in e500 powerpc cores is more or less a CPU
extension _not_ another CPU. e500's generic purpose register are
extended to 64bit. "normal" opcodes can't touch the upper 32bit so we
still talkin about a 32bit CPU. The SIMD instructions can use them so
they can add two ints in one go for instance. This SIMD is not
compatible with AltiVec (AltiVec has dedicated registers which are
128bit long). The SPE opodes use even the same opcode range so AltiVec
and SPE is mutually exclusive. The SPE floating point (which is called
embedded floating point) is capable of the type float (e500v1+) and
double (since e500v2). They can also utilise the upper 32bit (double
precision is 64bit in size). This floating point is executed in
hardware and is compatible with IEEE 754. The difference here with
"classic" PowerPC FPU is that we don't have dedicated registers for
floating point and use the generic purpose register.

>Also from reading some mails from the libc-alpha [0] list when the port
>was upstreamed, it seems that it might be possible to mix code built
>for powerpc SPE and for other powerpc features? So is it really necessary
>to build a whole port for this, isn't it possible to build specific
>libraries using the hwcap infrastructure instead, or do the objects
>built actually have a different ELF ABI and the objects would refuse to
>be linked together (like in the ARM case before the EABI)?
I haven't thought about this but let me go through it:
- variant one: a library function like:
 float func(float a)
 {
     return a + 10
 }

 So you go and compile this function twice: powerpc where classic
 floating point is used another one where embedded floating point is
 used. So we have every library twice and it will probably double the
 build time on buildds. Now, what about the user of this library? Classic
 FPU would pass the variable a in register f1 (floating point register
 one, as I mentioned before they have dedicated floating point
 registers). Embedded floating point will pass a in r3 (general purpose
 register three as we don't have dedicated registers we use softfloat
 ABI here). So the user of the application itself would have to be
 compiled twice as well.
 
 Another idea that just come to my mind is to use floating point as we
 do now in powerpc. This will require floating point emulation in kernel
 for e500 cpus and this is slow [0]. Then identify the hotpaths (i.e.
 libraries which rely heavy on floating point) and compile those a second
 time with SPE. The problem here is that you can't mix hard and softfloat
 due to the way arguments are passed. So you would need wrappers for
 them. So this looks like a total pain in the ass. Not that you have to
 hunt libraries which you want optimize you have also come up with
 wrappers around them. Maybe there is even something I forgot :)

- variant two: a operation like a + b where we call in a library to
  compute the floating point operation. Here we would put the
  computation itself into a library like glibc/gcc which would use
  classic or embedded floating point depending on hwcap. Again the problem
  how do pass the arguments. Plus we don't utilize all registes and have
  function calls for every "simple" operation. Not only that we have a
  new ABI here we also make it slow for every one.

To summarize it a little: Yes your gcc-4.3+ should be capable to create
code for both sides but I have _no_ idea how you could mix it
efficiently.

>I don't want to seem like I'm blocking the inclusion of this, just
>want to make sure there's been thoughts on all this stuff, as creating
>a new architecture involves lots of work, and more so if it might seek
>official inclusion in Debian. When there might be an easier way to
>handle this specific need.
No problem. An easier sollution is also welcome but I don't see it right
now. The ABI is different, it is soft float not hard float. Having
buildds rebuilding the code without a change seems more efficient than
everything else I come up until now.

>> Do you thing it causing more trouble than good?
>
>
>thanks,
>guillem
>
>[0] <http://sourceware.org/ml/libc-ports/2007-10/msg00002.html>

[0] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=520877#48

Sebastian



-- 
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]

Reply via email to