Ben,
It's not an insane thing to look at. My gut is that you will find that it's
not the right choice, but it really depends on the details of your
situation. Here's how I'd look at the problem:
1. How much total computation do you need (expressed, say, in "standard PC
hours")? If your total need is 10 PC years, that'll drive a very different
solution than if you need 10 million PC years.
2. What's the useful computation / $ ratio of a cell system to a
conventional system?
3. What's the startup cost of doing anything on the Cell (signing up for the
dev program, setting up your first dev system, learning the tools, etc.)?
4. What's the productivity penalty of all incremental work on the Cell
system (ie, working with substandard / non-standard tools, adapting to the
Cell architecture, doing remote debugging, etc.)?
We're mostly arguing about #2, but my gut is that for a very wide range of
answers to #1, the answers to #3 and #4 will drive you to a conventional
architecture.
A couple of things to bear in mind: Sony has a reputation for producing very
substandard dev kits. I can't speak to the PS3 kit, but it's something I'd
worry about.
Remember also that the Cell strips out a lot of the things you've come to
expect from modern CPUs. So, for example, if your code depends heavily on
good branch prediction (which interpreters often do), you could be in
trouble. In general, the Cell is likely to do best at operations that
involve massive repetition of fairly simple non-branching floating point
operations. The claim (which many people believe isn't entirely accurate)
is that it should be blazing fast at SIMD operations like manipulating
massive amounts of vertex data.
-mattb
> -----Original Message-----
> From: Ben Goertzel [mailto:[EMAIL PROTECTED]
> Sent: Sunday, November 27, 2005 12:53 PM
> To: [email protected]
> Subject: Re: [agi] /. [Unleashing the Power of the Cell Broadband Engine]
>
> Matt,
>
> Hmmm ... I guess I need to be clearer about my conjectured potential
> use for the Cell within Novamente or other AI systems.
>
> I agree with most of your general sentiments about the obstacles to
> using specialized architectures within AGI systems, but I don't feel
> your comments answer my specific questions about making a specialized
> GP/BOA engine on the SPE. I'd be curious for any further comments
> from your end, specifically targeted to the issues I'll mention in the
> rest of this email.
>
> Firstly, just to be super-clear: I would not want to put an entire AGI
> on the Cell for many reasons, including the limited RAM and the
> expected short lifespan of the architecture. What I was thinking was
> that it could make sense to write a specialized "GP program learning
> module" for the Cell.
>
> I understand that this would require writing code completely from
> scratch, but this is not a big deal because writing GP is not very
> much code. (And the same holds for the probabilistic variants of GP
> that we use in Novamente.)
>
> Also, this wouldn't require a lot of programmers to understand the
> Cell architecture. Basically, it would require one really smart guy
> to understand the Cell architecture and spend a few dedicated months
> writing a general GP (or BOA, etc.) system for the Cell. One could
> then use PS3's basically as "GP/BOA boxes" and plug them into AGI
> systems or other applications, which would communicate with them via
> sockets in a way that requires no knowledge of Cell internals.
>
> The viability of this idea, however, really comes down to the
> question: How slow is the access of the main RAM from the SPE's. Is
> this significantly slower than on a traditional modern computer? If
> so then the idea may not be viable (unless one is doing simple GP/BOA
> problems where the fitness function can fit inside 256KB of memory,
> which is not generally the case with AGI-related evolutionary learning
> problems).
>
> Pertinent to this question, the link I sent before says
>
> "
> The most productive SPE memory-access model appears to be the one in
> which a list (such as a scatter-gather list) of DMA transfers is
> constructed in an SPE's local store so that the SPE's DMA controller
> can process the list asynchronously while the SPE operates on
> previously transferred data. In several cases, this new approach to
> accessing memory has led to application performance exceeding that of
> conventional processors by almost two orders of magnitude,
> significantly more than anyone would expect from the peak performance
> ratio (about 10x) between the Cell Broadband Engine and conventional
> PC processors.
> "
>
> The question then seems to be whether this type of memory-access model
> can be used in the case where the SPE is running a simple interpreter
> that is interpreting a GP program tree; and the program tree, in order
> to execute, needs to grab a lot of data from main memory.
>
> It would seem that if the interpreter is clever in terms of partial
> evaluation, this might be case; because at each point in time during
> the interpretation process, the interpreter could partially evaluate
> the program tree based on the data that has asynchronously come into
> the SPE already, and then process the further data when it comes
> in.... This assumes that most of the data needed by a program tree
> will be determinable via simple inspection of the program tree up
> front, rather than only determinable during the course of evaluation
> of the program tree (an assumption that I believe will generally be
> valid during GP/BOA program tree learning).
>
> I'd be curious for highly specific thoughts on this from anyone on
> this list who understands both genetic programming and computer
> architecture (I'm strong in the former but weak in the latter).
>
> thanks
> Ben
>
> On 11/27/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> > > Repeat after me: "it depends on your problem". Some (many) codes will
> > > bite. Some will run like rabid foxes on meth.
> > Yes. My assertion is that for most interesting real-world problems, the
> > Cell isn't a good fit.
> >
> > > It's time to dive into the parallel programming model. If you can't
> > > state your problem in terms of asynchronous message passing (not just
> > > threads), you've got a problem on your hands that will only get
> > > worse with time.
> > In the real world, I think this is mostly wrong. Remember your
> constraints:
> > programmers (especially good ones) are extremely rare and expensive.
> Fast
> > computers are not. Unless your problem is truly massive (Google-scale),
> you
> > should be optimizing for programmer productivity rather than FLOPS / $.
> > That means using as little parallelism as possible. In most cases, that
> > means no parallelism at all-- a modern CPU can handle an astonishing
> amount
> > of work, if well programmed.
> >
> > If you must use parallelism, I assert (and it sounds like you agree)
> that a
> > message-based architecture is very often a better choice than a
> > multi-threaded one, purely because it's easier to work with.
> >
> > Multi-core architectures are obviously the future. My objection is to
> the
> > specifics of the Cell architecture, which makes trade-offs around
> symmetry
> > and memory access which I (and many others) consider very sub-optimal
> for
> > most real-world applications.
> >
> > You're right that current CPU architectures have serious issues with
> memory
> > latency, and that managing those issues effectively is part of what
> > separates good from mediocre programmers. The problem is that those
> issues
> > look to be much more severe on the Cell than on competing architectures.
> >
> > There's another real-world issue with the Cell, which has to do with
> > lifecycle. The very strong consensus in the gaming community is that to
> > write a decent PS3 app, you'll need to throw away all your existing code
> and
> > start from scratch. The first generation of apps will probably be
> > profoundly mediocre, as developers take time to get a feel for the new
> > architecture. The lifetime of the architecture will be about 5 years,
> at
> > the end of which time all code written for it is almost certain to be a
> dead
> > end. That's painful but survivable if you're in the console games
> business.
> > It's a disaster if you're in the AI business (unless your timeframe for
> a
> > seed AI is < 5 years...)
> >
> > This is another example of Sony optimizing for the wrong problem--
> they're
> > maximizing theoretical FLOPS at the expense of real-world programmer
> > productivity.
> >
> > Don't get me wrong-- I make my living writing massive distributed
> > applications. When you have to parallelize, you have to parallelize.
> But
> > you should do so in a very thoughtful and deliberate manner.
> >
> > -mattb
> >
> >
> > -------
> > To unsubscribe, change your address, or temporarily deactivate your
> subscription,
> > please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
> >
>
> -------
> To unsubscribe, change your address, or temporarily deactivate your
> subscription,
> please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
-------
To unsubscribe, change your address, or temporarily deactivate your
subscription,
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]