Hey Harris, >From your description, I think you're getting a problem with one of the major >bottlenecks of collage: reading from many connections happens in one thread. >We see this issue mostly in HPC context, where we have up to 150 nodes >rendering for one master. One thread is just too slow to handle so many >connections.
We even have the effect of connections "starving", because the reader doesn't even have the chance to select the connections in the "back" of the connection set. This is another thing to be done for Collage: handle connections fairly on the reader side, to avoid the starving effect. Every connection should have the same chance to be selected. Right now, the first connections are getting prioritized. If you are desperate, you can try our multiple read threads implementation, which will improve reading throughput. Have a look at https://github.com/rttag/Collage/commits/master , the commits from Nov 18th to 28th, maybe also pick the one from Dec 18th and definitely the one from Jan 3rd. Unfortunately, this Collage is not totally up to date, so maybe you have to merge a bit. Also, we don't use this stuff productively yet, so there might be bugs. But in first tests we saw a huge performance gain for scalability scenarios even with a low number of clients and also improvements in many other areas. This topic is very interesting for us as well, please keep me updated. Regards, Carsten P.S.: Maybe you're not even having a software issue. We had a similar problem at a customer (twice by now). After quite some analysis we identified the switch as one of the problems, and a virus scan with "intrusion prevention" was deep scanning all the packets and therefor stalling network traffic. But I guess, you're not running windows, are you? Hello everyone, we've been running equalizer for a while in a large visualization cluster. Our architecture has 18 nodes, each with 4 GPUs. The Equalizer configuration file actually defines 72 nodes, each with one pipe, one window and finally one channel (with the pipe assigned to the corresponding GPU on the system and the OpenGL context getting properly created there). Furthermore, for each GPU, we define one canvas and one segment. So we end up with 72 nodes and 72 canvases. What I am observing is best described as jittering or frame stuttering, happening about every 1 second. The frame rate will drop from 100s of FPS down to 1-2 very briefly and then recover, only to happen soon there after. This is not related to rendering complexity (it happens with very simple scenes and also with the various eq samples). I did some profiling on the AppNode driving the cluster in order to narrow down the source of the issue. I am noticing hotspots in co::LocalNode::_runReceiverThread (38.17% of all samples). In particular, there seems to be a bunch of time spent within co::LocalNode::_handleData (26.6% of all CPU time) and approximately 12.7% for the call to ConnectionSet::select() within the same function (_runReceiverThread). The second hotspot that I've noticed is in the ServerThread::run() function and more specifically in _cmdStartFrame() (roughly 25% of CPU time spent there). Our application is relatively simple with a basic distributed object for application state (a few kb in size). This object gets commited 2-3 times during a single event/frame loop. I've tried a number of things to work around this: -Forcing swap-sync to off through out the cluster -Trying different pipe threading modes -Setting up RSP (which seems to work but makes no difference) -Played around with swap barries -Disabled statistics collection None of the above made any significant difference in terms of performance. My next step is to simplify the equalizer configuration by fixing the mess that I currently have with the 72 canvases and actually use 4 canvases (one per wall) with properly defined segments. Meanwhile, I'd love to get people's input on the above! Thank you all, Harris -- View this message in context: http://software.1713.n2.nabble.com/Jittery-performance-with-large-cluster-tp7585928.html Sent from the Equalizer - Parallel Rendering mailing list archive at Nabble.com. _______________________________________________ eq-dev mailing list [email protected] http://www.equalizergraphics.com/cgi-bin/mailman/listinfo/eq-dev http://www.equalizergraphics.com This email and any attachments are intended solely for the use of the individual or entity to whom it is addressed and may be confidential and/or privileged. If you are not one of the named recipients or have received this email in error, (i) you should not read, disclose, or copy it, (ii) please notify sender of your receipt by reply email and delete this email and all attachments, (iii) Realtime Technology does not accept or assume any liability or responsibility for any use of or reliance on this email. For other languages, go to http://www.3ds.com/terms/email-disclaimer _______________________________________________ eq-dev mailing list [email protected] http://www.equalizergraphics.com/cgi-bin/mailman/listinfo/eq-dev http://www.equalizergraphics.com

