Hey Harris,

>From your description, I think you're getting a problem with one of the major 
>bottlenecks of collage: reading from many connections happens in one thread. 
>We see this issue mostly in HPC context, where we have up to 150 nodes 
>rendering for one master. One thread is just too slow to handle so many 
>connections.

We even have the effect of connections "starving", because the reader doesn't 
even have the chance to select the connections in the "back" of the connection 
set. This is another thing to be done for Collage: handle connections fairly on 
the reader side, to avoid the starving effect. Every connection should have the 
same chance to be selected. Right now, the first connections are getting 
prioritized.

If you are desperate, you can try our multiple read threads implementation, 
which will improve reading throughput. Have a look at 
https://github.com/rttag/Collage/commits/master , the commits from Nov 18th to 
28th, maybe also pick the one from Dec 18th and definitely the one from Jan 
3rd. Unfortunately, this Collage is not totally up to date, so maybe you have 
to merge a bit. Also, we don't use this stuff productively yet, so there might 
be bugs. But in first tests we saw a huge performance gain for scalability 
scenarios even with a low number of clients and also improvements in many other 
areas.

This topic is very interesting for us as well, please keep me updated.

Regards,
Carsten

P.S.: Maybe you're not even having a software issue. We had a similar problem 
at a customer (twice by now). After quite some analysis we identified the 
switch as one of the problems, and a virus scan with "intrusion prevention" was 
deep scanning all the packets and therefor stalling network traffic. But I 
guess, you're not running windows, are you?





Hello everyone,

we've been running equalizer for a while in a large visualization cluster.
Our architecture has 18 nodes, each with 4 GPUs. The Equalizer configuration 
file actually defines 72 nodes, each with one pipe, one window and finally one 
channel (with the pipe assigned to the corresponding GPU on the system and the 
OpenGL context getting properly created there). Furthermore, for each GPU, we 
define one canvas and one segment. So we end up with 72 nodes and 72 canvases.

What I am observing is best described as jittering or frame stuttering, 
happening about every 1 second. The frame rate will drop from 100s of FPS down 
to 1-2 very briefly and then recover, only to happen soon there after.
This is not related to rendering complexity (it happens with very simple scenes 
and also with the various eq samples).

I did some profiling on the AppNode driving the cluster in order to narrow down 
the source of the issue. I am noticing hotspots in 
co::LocalNode::_runReceiverThread (38.17% of all samples). In particular, there 
seems to be a bunch of time spent within co::LocalNode::_handleData (26.6% of 
all CPU time) and approximately 12.7% for the call to
ConnectionSet::select() within the same function (_runReceiverThread). The 
second hotspot that I've noticed is in the ServerThread::run() function and 
more specifically in _cmdStartFrame() (roughly 25% of CPU time spent there).

Our application is relatively simple with a basic distributed object for 
application state (a few kb in size). This object gets commited 2-3 times 
during a single event/frame loop.

I've tried a number of things to work around this:
-Forcing swap-sync to off through out the cluster -Trying different pipe 
threading modes -Setting up RSP (which seems to work but makes no difference) 
-Played around with swap barries -Disabled statistics collection

None of the above made any significant difference in terms of performance.

My next step is to simplify the equalizer configuration by fixing the mess that 
I currently have with the 72 canvases and actually use 4 canvases (one per 
wall) with properly defined segments. Meanwhile, I'd love to get people's input 
on the above!

Thank you all,

Harris





--
View this message in context: 
http://software.1713.n2.nabble.com/Jittery-performance-with-large-cluster-tp7585928.html
Sent from the Equalizer - Parallel Rendering mailing list archive at Nabble.com.

_______________________________________________
eq-dev mailing list
[email protected]
http://www.equalizergraphics.com/cgi-bin/mailman/listinfo/eq-dev
http://www.equalizergraphics.com
This email and any attachments are intended solely for the use of the 
individual or entity to whom it is addressed and may be confidential and/or 
privileged.

If you are not one of the named recipients or have received this email in error,

(i) you should not read, disclose, or copy it,

(ii) please notify sender of your receipt by reply email and delete this email 
and all attachments,

(iii) Realtime Technology does not accept or assume any liability or 
responsibility for any use of or reliance on this email.

For other languages, go to http://www.3ds.com/terms/email-disclaimer

_______________________________________________
eq-dev mailing list
[email protected]
http://www.equalizergraphics.com/cgi-bin/mailman/listinfo/eq-dev
http://www.equalizergraphics.com

Reply via email to