Re: [9fans] GCC/G++: some stress testing

Philippe Anel Mon, 03 Mar 2008 01:12:42 -0800

Ron, I thought Paul was talking about cache coherent system on which ahigh-contention lock can become a huge problem.Although the work did by Jim Taft on the NASA project looks veryinteresting (and if you have pointers to papers about locking primitiveon such system, I would appreciate), it seems this system is memorycoherent, not cache coherent (coherency maintained by SGI NUMALinkinterconnect fabric).And I agree with you. I also think (global) shared memory for IPC ismore efficient than passing copied data across the nodes, and I supposeseveral papers tend to confirm this is the case: today's interconnectfabrics are lot of faster than memory memory access.My conjecture (I only have access to a simple dual core machines) isabout locking primitive used in CSP (and IPC), I mean libthread which isbased on rendezvous system call (which does use locking primitives9/proc.c:sysrendezvous() ). I think this is the only reason why CSPwould not scale well.

Regarding my (other) conjecture about IPI, please read my answer to Paul.


Phil;

If CSP system itself takes care about memory hierarchy and uses no
 synchronisation (using IPI to send message to another core by example),
 CSP scales very well.


Is this something you have measured or is this conjecture?

 Of course IPI mechanism requires a switch to kernel mode which costs a
 lot. But this is necessary only if the destination thread is running on
 another core, and I don't think latency is very important in algorigthms
 requiring a lot of cpus.


same question.

For a look at an interesting library that scaled well on a 1024-node
SMP at NASA Ame's, by Jim Taft.
Short form: use shared memory for IPC, not data sharing.

he's done very well this way.

ron

Re: [9fans] GCC/G++: some stress testing

Reply via email to