Ron, I thought Paul was talking about cache coherent system on which a high-contention lock can become a huge problem. Although the work did by Jim Taft on the NASA project looks very interesting (and if you have pointers to papers about locking primitive on such system, I would appreciate), it seems this system is memory coherent, not cache coherent (coherency maintained by SGI NUMALink interconnect fabric). And I agree with you. I also think (global) shared memory for IPC is more efficient than passing copied data across the nodes, and I suppose several papers tend to confirm this is the case: today's interconnect fabrics are lot of faster than memory memory access. My conjecture (I only have access to a simple dual core machines) is about locking primitive used in CSP (and IPC), I mean libthread which is based on rendezvous system call (which does use locking primitives 9/proc.c:sysrendezvous() ). I think this is the only reason why CSP would not scale well.
Regarding my (other) conjecture about IPI, please read my answer to Paul.

Phil;

If CSP system itself takes care about memory hierarchy and uses no
 synchronisation (using IPI to send message to another core by example),
 CSP scales very well.

Is this something you have measured or is this conjecture?

 Of course IPI mechanism requires a switch to kernel mode which costs a
 lot. But this is necessary only if the destination thread is running on
 another core, and I don't think latency is very important in algorigthms
 requiring a lot of cpus.

same question.

For a look at an interesting library that scaled well on a 1024-node
SMP at NASA Ame's, by Jim Taft.
Short form: use shared memory for IPC, not data sharing.

he's done very well this way.

ron



Reply via email to