Hi Dan,
You are almost right, usually the general architecture is longest-lived.
However in this case something like the Los Alamos National Laboratories
roadrunner architecture has great advantages. The code needs to be
vectorized anyway, so why not run it in six vector processors instead of
one. Well, the foldingathome people have 20% Playstations and these supply
80% of calculating power.
And anyway, you may be right and I may be wrong...
Cheers,
Jouko
On-chip high-speed memory is a definite limit. I remember seeing a 65000
point fft fitting into one fully functional cell processor. After that the
thing slows down because main memory has to be used.
"Life is pretty simple: You do some stuff. Most fails. Some works. You do
more of what works. If it works big, others quickly copy it. Then you do
something else. The trick is to do something else."
On Tue, 24 Jun 2008, Dan Werthimer wrote:
hi juoko,
i suspect you are correct -
one can probably use a cell or gpu as a CPU accelerator or
for simple algorithms that don't need much code development
or memory -- then when the architecture fashion changes to
something else, one won't have lost a lot of time
working on a specialized architecture and can
port to the next specialized platform of the day.
eg: if the code is dominated by FFT's, and
they are not large FFT's (they fit in memory),
then it's pretty easy to offload that part of the code
to a cell or GPU (assuming it isn't I/O dominated).
but the history of people attempting to do this
over the last 30 years is fraught with mostly failure.
(eg: array processors, fpga accelerators, etc)
- some people can get temporary speed ups at the expense of a
lot of work. perhaps CUDA or some new language
will change this.
there are a few exceptions - fold...@home on cell, etc.
but not many. nvidia gave us funding to port s...@home
to a GPU and we tried and then gave them their money back.
(although this was before CUDA, and it's probably easier now).
best wishes,
dan
Jouko Ritakari wrote:
Hi Dan,
Usually I agree with you totally, this time I have to disagree.
The cell processor is so much better for limited tasks, let's use it for
them.
I agree it's pretty painful programming environment, possibly we are better
than the top-three Sony engineers. Mainly it's a matter of memory
bandwidth, you get 25 GBps between each SPU and aggregate of 25 GBps to
main memory, which I am not totally sure of.
I totally disagree with the view on beowulf clusters, the los alamos
national laboratorie's approach seems more viable at the moment. Will
change opinions when appropriate.
Cheers,
Jouko
"Life is pretty simple: You do some stuff. Most fails. Some works. You do
more of what works. If it works big, others quickly copy it. Then you do
something else. The trick is to do something else."
On Tue, 24 Jun 2008, Dan Werthimer wrote:
hi sergie and jouko,
i'm not an expert at PS3 -
my group tried to port a few programs -
some were successful - those with a small memory footprint.
it's a pretty painful programming environment.
s...@home was a disaster (s...@home does 128K point FFT's),
sony had three of their best engineers working on it for
a long time and finally gave up.
we've also played with GPU's.
i'm not a big fan of specialized architectures,
because they come and go, and you can't port software
to the next generation (eg: array processors, cell, GPU's,
bluegene torus interconnect, etc, are short lived).
i like beowulf clusters with every node connected to every node
through a switch and MPI, because
they have/will be around for a long time and it's relatively
easy to develop and port software through several generations of hardware.
dan
Jouko Ritakari wrote:
Hi all,
On Tue, 24 Jun 2008, Sergei Pogrebenko wrote:
I also had some doscussions with Dan Werthimer about his FPGA plans
(in a wake of uniboard project) and about Sony PS3. He and his team
tried to re-write s...@home for PS3, but got it sadly running well below
expectation. They contacted Sony/IBM compiler guys, but for 3 month
no answer.
We fortunately have the answer.
Even vectorized and optimized code for PS3 often gets five to ten per
cent of the actual calculation capacity.
It's a bit tricky to get the (about) 90% we got, we took Daniel
Hackenberg's matmul program and modified it. Daniel actually got 97%
efficiency, we didn't bother about the last seven per cent. It's less
expensive to buy more playstations.
Cross-pol feature is also of interest in a wake of new (ATA-kind)
very broad band linear polarized feeds, so the conversion to circular
polarization has to be done. For that the cross-pol complex spectrum
should be calculated.
Our results are very in-line with FPGA spectrometers, although they
are much faster, it's difficult to get really hign resolution with them.
Well, the cell processors are the best you can have at the time, for some
of the tasks. FPGAs or display controllers are better for very limited
tasks, normal Intel-architecture or powerpc processors for very big.
If you have to do number-crunching the cell processor may be the best.
And for you casper people, the PS3 control program may be even better
platform than PS3 Linux although much trickier. The folding-at-home
people are the ones to ask, they have the experience.
Best regards,
Jouko