Re: [osg-users] OSG thread profiling results are in!!

2008-07-03 Thread Paul Speed

You are sending in code again. ;)
-Paul

James Killian wrote:
 

We are using some particle effects pretty heavily, and we noticed (using 
filemon) that the smoke image file is being read over and over again, 
many times (perhaps once per frame).  Is this possible?  We are going to 
look into that next.  Maybe we can cache the single image (state set)?


 
I found a way to cache the images.  The Registry in osgDB has the 
ability to set options.  It appears that _options is NULL by default, 
the options have a CACHE_IMAGES flag to use along with others.  This 
works against any code that uses readImage(filename).  There are others 
to explore too:
 
Does anyone have any experience with using these options?  is there any 
others that should or should not be used? 
 
 
 
James Killian


- Original Message -
*From:* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
*To:* OpenSceneGraph Users mailto:osg-users@lists.openscenegraph.org
*Sent:* Wednesday, July 02, 2008 2:11 PM
*Subject:* Re: [osg-users] OSG thread profiling results are in!!

Hi Robert,
 
I got the stats handler working on our scene and displaying.  I am

not sure I understand what the different numbers mean and how I
might work with them.  I can see the optimization effort is a big
deal.  I know it is beyond the scope of this group.  Are there any
resources out there to look at? 
 
I have finished the work you had already mentioned, like using png

rather than bmp everywhere.  We are also working on making sure our
images are as small as possible.  We are also going to work on using
LOD.  Since we are in space and most ships are far away, we are sure
we can make a big jump there.  I used osgUtil::Optimizer and that
game me a few more frames.
 
What are some other suggestions?
 
We are using some particle effects pretty heavily, and we noticed

(using filemon) that the smoke image file is being read over and
over again, many times (perhaps once per frame).  Is this possible? 
We are going to look into that next.  Maybe we can cache the single

image (state set)?
 
Thanks

-- Rick

On Sat, Jun 28, 2008 at 11:55 AM, Robert Osfield
[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote:

On Sat, Jun 28, 2008 at 4:35 PM, James Killian
[EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
wrote:
  The thread profiler does provide detailed information of
every threaded
  activity at any given time.  I just wish there was some way
to present the
  information given that would be more meaningful to the group.
 
  What would be great is to have a big balanced scene that can
put OSG Viewer
  to the test in a way where it puts equal intense stress on
update, culling,
  and draw dispatch.  What I'd hope to see is the draw dispatch
be on a
  separate thread, where that thread showed mostly I/O
activity, and the cpu
  activity on other threads.

The osgViewer::StatsHandler will display update, event, cull, draw
dispatch on all systems and draw GPU stats.  The GPU stats
require an
OpenGL extension that I've only seen Nvidia implement so far, so you
won't see this stats printed out on all systems.

Also record a camera path/game sequence that you can use for
benchmarking so that every run the app does the same thing, then
you'll be able to study the effects that changes you make have on
final performance.  You'll also be able to study the above stats to
where the problems occur in your scene.

As a small note, the OSG in CullDrawThreadPerContext,
DrawThreadPerContext and CullThreadPerCameraDrawThreadPerContext run
graphics in a separate thread.

Robert.
___
osg-users mailing list
osg-users@lists.openscenegraph.org
mailto:osg-users@lists.openscenegraph.org

http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org




-- 
  Rick

Check us out at http://fringe-online.com/



___
osg-users mailing list
osg-users@lists.openscenegraph.org
http://lists.openscenegraphorg/listinfo.cgi/osg-users-openscenegraph.org




___
osg-users mailing list
osg-users@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org


___
osg-users mailing list
osg-users@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org


Re: [osg-users] OSG thread profiling results are in!!

2008-07-03 Thread James Killian

Gahhh... thanks for the heads up... @#$#@ Microsoft...  I've disabled html
and going to use plain text from now on.

- Original Message - 
From: Paul Speed [EMAIL PROTECTED]
To: OpenSceneGraph Users osg-users@lists.openscenegraph.org
Sent: Thursday, July 03, 2008 3:54 AM
Subject: Re: [osg-users] OSG thread profiling results are in!!


 You are sending in code again. ;)
 -Paul

 James Killian wrote:
 
  
  We are using some particle effects pretty heavily, and we noticed (using
  filemon) that the smoke image file is being read over and over again,
  many times (perhaps once per frame).  Is this possible?  We are going to
  look into that next.  Maybe we can cache the single image (state set)?
  
 
  I found a way to cache the images.  The Registry in osgDB has the
  ability to set options.  It appears that _options is NULL by default,
  the options have a CACHE_IMAGES flag to use along with others.  This
  works against any code that uses readImage(filename).  There are others
  to explore too:
 
  Does anyone have any experience with using these options?  is there any
  others that should or should not be used?
 
 
 
  James Killian
 
  - Original Message -
  *From:* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
  *To:* OpenSceneGraph Users
mailto:osg-users@lists.openscenegraph.org
  *Sent:* Wednesday, July 02, 2008 2:11 PM
  *Subject:* Re: [osg-users] OSG thread profiling results are in!!
 
  Hi Robert,
 
  I got the stats handler working on our scene and displaying.  I am
  not sure I understand what the different numbers mean and how I
  might work with them.  I can see the optimization effort is a big
  deal.  I know it is beyond the scope of this group.  Are there any
  resources out there to look at?
 
  I have finished the work you had already mentioned, like using png
  rather than bmp everywhere.  We are also working on making sure our
  images are as small as possible.  We are also going to work on using
  LOD.  Since we are in space and most ships are far away, we are sure
  we can make a big jump there.  I used osgUtil::Optimizer and that
  game me a few more frames.
 
  What are some other suggestions?
 
  We are using some particle effects pretty heavily, and we noticed
  (using filemon) that the smoke image file is being read over and
  over again, many times (perhaps once per frame).  Is this possible?
  We are going to look into that next.  Maybe we can cache the single
  image (state set)?
 
  Thanks
  -- Rick
 
  On Sat, Jun 28, 2008 at 11:55 AM, Robert Osfield
  [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote:
 
  On Sat, Jun 28, 2008 at 4:35 PM, James Killian
  [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
  wrote:
The thread profiler does provide detailed information of
  every threaded
activity at any given time.  I just wish there was some way
  to present the
information given that would be more meaningful to the group.
   
What would be great is to have a big balanced scene that can
  put OSG Viewer
to the test in a way where it puts equal intense stress on
  update, culling,
and draw dispatch.  What I'd hope to see is the draw dispatch
  be on a
separate thread, where that thread showed mostly I/O
  activity, and the cpu
activity on other threads.
 
  The osgViewer::StatsHandler will display update, event, cull,
draw
  dispatch on all systems and draw GPU stats.  The GPU stats
  require an
  OpenGL extension that I've only seen Nvidia implement so far, so
you
  won't see this stats printed out on all systems.
 
  Also record a camera path/game sequence that you can use for
  benchmarking so that every run the app does the same thing, then
  you'll be able to study the effects that changes you make have
on
  final performance.  You'll also be able to study the above stats
to
  where the problems occur in your scene.
 
  As a small note, the OSG in CullDrawThreadPerContext,
  DrawThreadPerContext and CullThreadPerCameraDrawThreadPerContext
run
  graphics in a separate thread.
 
  Robert.
  ___
  osg-users mailing list
  osg-users@lists.openscenegraph.org
  mailto:osg-users@lists.openscenegraph.org
 
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
 
 
 
 
  -- 
Rick
  Check us out at http://fringe-online.com/
 

  
 
  ___
  osg-users mailing list
  osg-users@lists.openscenegraph.org
 
http://lists.openscenegraphorg/listinfo.cgi/osg-users

Re: [osg-users] OSG thread profiling results are in!!

2008-07-02 Thread rpingry
Hi Robert,

I got the stats handler working on our scene and displaying.  I am not sure
I understand what the different numbers mean and how I might work with
them.  I can see the optimization effort is a big deal.  I know it is beyond
the scope of this group.  Are there any resources out there to look at?

I have finished the work you had already mentioned, like using png rather
than bmp everywhere.  We are also working on making sure our images are as
small as possible.  We are also going to work on using LOD.  Since we are in
space and most ships are far away, we are sure we can make a big jump
there.  I used osgUtil::Optimizer and that game me a few more frames.

What are some other suggestions?

We are using some particle effects pretty heavily, and we noticed (using
filemon) that the smoke image file is being read over and over again, many
times (perhaps once per frame).  Is this possible?  We are going to look
into that next.  Maybe we can cache the single image (state set)?

Thanks
-- Rick

On Sat, Jun 28, 2008 at 11:55 AM, Robert Osfield [EMAIL PROTECTED]
wrote:

 On Sat, Jun 28, 2008 at 4:35 PM, James Killian
 [EMAIL PROTECTED] wrote:
  The thread profiler does provide detailed information of every threaded
  activity at any given time.  I just wish there was some way to present
 the
  information given that would be more meaningful to the group.
 
  What would be great is to have a big balanced scene that can put OSG
 Viewer
  to the test in a way where it puts equal intense stress on update,
 culling,
  and draw dispatch.  What I'd hope to see is the draw dispatch be on a
  separate thread, where that thread showed mostly I/O activity, and the
 cpu
  activity on other threads.

 The osgViewer::StatsHandler will display update, event, cull, draw
 dispatch on all systems and draw GPU stats.  The GPU stats require an
 OpenGL extension that I've only seen Nvidia implement so far, so you
 won't see this stats printed out on all systems.

 Also record a camera path/game sequence that you can use for
 benchmarking so that every run the app does the same thing, then
 you'll be able to study the effects that changes you make have on
 final performance.  You'll also be able to study the above stats to
 where the problems occur in your scene.

 As a small note, the OSG in CullDrawThreadPerContext,
 DrawThreadPerContext and CullThreadPerCameraDrawThreadPerContext run
 graphics in a separate thread.

 Robert.
  ___
 osg-users mailing list
 osg-users@lists.openscenegraph.org
 http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org




-- 
 Rick
Check us out at http://fringe-online.com/
___
osg-users mailing list
osg-users@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org


Re: [osg-users] OSG thread profiling results are in!!

2008-06-28 Thread Robert Osfield
Hi James,

I've read your emails but I'm afraid the stats mean absolutely nothing
do me.  One will really need to be find out what parts of the OSG i.e.
what function calls are the current bottleneck.

As general note, performance optimization with scene graphs as almost
always an issue of improving the balance of the scene graph, be it
update, cull, draw dispatch or draw GPU, it's almost always a poor
scene graph that is at fault.  You can often improve performance by
10x and more by simply fixing the scene graph.  Doing low level code
optimization will rarely get you anything like the performance
improvement that you'd get by just fixing the scene graph.

Given this, diving into low level profiling could well be a case of
not seeing the wood from the trees.   So I'd recommend if you want
your app to go faster start with the basics, are your CPU or GPU
limited.  Then are you update, cull or draw dispatch limited?  Then
depending upon what results you get consider why the scene graph
itself is making things so slow.  This process will typically lead you
to things you can do to your scene graph to fix the performance
bottleneck, and all this without touching the actual code.
Performance optimization is huge topic, but hopefully I'll have given
you a little pointer to priorities I'd apply.

Robert.

On Sat, Jun 28, 2008 at 4:12 AM, James Killian
[EMAIL PROTECTED] wrote:

 Here are some interesting profile results from the threaded profiler.
 First here is the ground work:
 OSG SVN 8482 using VS 7.1 with threading enabled (interlocked config).  The
 actual client code tested that pushes some stress on osg is our game which
 anyone can download here http://www.fringe-online.com/.So I run this and
 measure the thread performance using Intel's thread compiler.  So far, our
 client code main loop is very similar to how it is in the osg viewer (no
 fancy optimizations).

 There are 2 machines I have tested now... I'll post a copy of a different
 message I sent a few days ago here (to keep all info in this thread)

 ---snip
 Robert,
 This proposal you mention for 2.6 will it help balance the cpu workload
 against the  gpu I/O bottleneck?

 I've been doing some osg performance benchmark research on thread
 synchronization using the Intel Threaded compiler, and so far the results
 are looking really good except for a 26% over-utilization due to sleeping. I
 do want to say awesome job to those responsible for threading, the amount of
 critical section use looked very good!  All the worker threads also had good
 profiling results.

 The ultimate test I want to try today deals with an intentional GPU
 bottleneck... where I have a quadcore that pipes graphics out a PCI graphics
 card.  If anyone is interested I'll post these test results.  I know now
 that using a quad core there is lack of parallelization (e.g. 25% 85% 15%
 15%), but that is a different battle for a different time.

 I do want to get to the bottom of the profiling and determine how well the
 workload is balanced against the gpu i/o, and see if there is some
 opportunity for optimization here.
 -snip

 Today I have the numbers from the souped up machine with a poor poor pci
 graphics card.  The first thing to note is that the game never exceeded 18%
 cpu usage!!  When I profiled 65% of the main thread was devoted to serial
 time and the bulk of the cpu time was on *this thread* and
 PrintSchedulingInfo [20] thread.  The thread 20 showed 21% contributed to
 blocking, but the rest of it was active.  The rest of the threads (like with
 my machine) looked really good! it is just too bad they don't do much work.


 Realistically my machine at work is not typical due to the pci graphics, but
 it did put good stresses to show where the I/O bottle neck is (on the main
 thread).  My machine at home is a dual p 2.4 with NVidia GeForce 5900XT.
 When testing other games on my home machine I get great frame rate, so my
 goal will be to osg's performance to something comparable.

 Aside from the threading profiler, I have tested AMD code analyst to find
 the most frequent called code, and for osg 1.2  it turned out to be the
 Matrix Multiply.  Aside from that OSG itself took a significant bulk of the
 CPU workload.  This AMD profiler works differently in that it does not count
 sleeping or I/O time, but rather keeps note of the most frequent called.  At
 some point I'll retest for code optimizations, but not yet... the real gain
 now is to balance the CPU rendering against it sending to GPU.  It would be
 great if I can find a solution that can benefit the whole osg community (all
 platforms).

 If anyone has an interest in faster performance using the new osgViewer
 please share with me some ideas thanks.   I can track where bottlenecks are,
 but working out a good solution will take some time to learn.  I'll need to
 profile with VS 9 and OpenMP to see if this helps.


 

Re: [osg-users] OSG thread profiling results are in!!

2008-06-28 Thread rpingry
Thanks Robert,

(James and I are working on this together.  He has decided to focus his
attentions on understanding OpenSceneGraph and optimizing where he can,
where most of the client game code dealing with the scene graph is mine). I
was planning on asking some of these questions eventually anyway, but now
seems like a good time ;)

I think a huge part of what we are seeing is that we are relying very
heavily on image maps for everything, and that most graphics cards have to
swap memory.  That said, I know that I need to work on my dependence on
these image maps and look harder at resizing them where possible.  Are there
any other recommendations out there for more effective image map
utilization?  One thing I know I want to apply is LOD, which I have not done
yet.  In looking over the example code, the LOD is pretty straight forward.

All our ships use UV mapping, and I was trying to make it so that the image
was only loaded once for the ship type, rather than for each instance.  It
did not seem that this made much of a difference, however.  I have started
to look at the osgImposter example for help in how I might better handle
this.  AM I going in the right direction?

I have lots more questions, but I figure I will ask them as I get to them
and I am able to dig in myself.
Thanks again for all the great support,
-- Rick

On Sat, Jun 28, 2008 at 5:22 AM, Robert Osfield [EMAIL PROTECTED]
wrote:

 Hi James,

 I've read your emails but I'm afraid the stats mean absolutely nothing
 do me.  One will really need to be find out what parts of the OSG i.e.
 what function calls are the current bottleneck.

 As general note, performance optimization with scene graphs as almost
 always an issue of improving the balance of the scene graph, be it
 update, cull, draw dispatch or draw GPU, it's almost always a poor
 scene graph that is at fault.  You can often improve performance by
 10x and more by simply fixing the scene graph.  Doing low level code
 optimization will rarely get you anything like the performance
 improvement that you'd get by just fixing the scene graph.

 Given this, diving into low level profiling could well be a case of
 not seeing the wood from the trees.   So I'd recommend if you want
 your app to go faster start with the basics, are your CPU or GPU
 limited.  Then are you update, cull or draw dispatch limited?  Then
 depending upon what results you get consider why the scene graph
 itself is making things so slow.  This process will typically lead you
 to things you can do to your scene graph to fix the performance
 bottleneck, and all this without touching the actual code.
 Performance optimization is huge topic, but hopefully I'll have given
 you a little pointer to priorities I'd apply.

 Robert.

 On Sat, Jun 28, 2008 at 4:12 AM, James Killian
 [EMAIL PROTECTED] wrote:
 
  Here are some interesting profile results from the threaded profiler.
  First here is the ground work:
  OSG SVN 8482 using VS 7.1 with threading enabled (interlocked config).
  The
  actual client code tested that pushes some stress on osg is our game
 which
  anyone can download here http://www.fringe-online.com/.So I run this
 and
  measure the thread performance using Intel's thread compiler.  So far,
 our
  client code main loop is very similar to how it is in the osg viewer (no
  fancy optimizations).
 
  There are 2 machines I have tested now... I'll post a copy of a different
  message I sent a few days ago here (to keep all info in this thread)
 
  ---snip
  Robert,
  This proposal you mention for 2.6 will it help balance the cpu workload
  against the  gpu I/O bottleneck?
 
  I've been doing some osg performance benchmark research on thread
  synchronization using the Intel Threaded compiler, and so far the results
  are looking really good except for a 26% over-utilization due to
 sleeping. I
  do want to say awesome job to those responsible for threading, the amount
 of
  critical section use looked very good!  All the worker threads also had
 good
  profiling results.
 
  The ultimate test I want to try today deals with an intentional GPU
  bottleneck... where I have a quadcore that pipes graphics out a PCI
 graphics
  card.  If anyone is interested I'll post these test results.  I know now
  that using a quad core there is lack of parallelization (e.g. 25% 85% 15%
  15%), but that is a different battle for a different time.
 
  I do want to get to the bottom of the profiling and determine how well
 the
  workload is balanced against the gpu i/o, and see if there is some
  opportunity for optimization here.
  -snip
 
  Today I have the numbers from the souped up machine with a poor poor pci
  graphics card.  The first thing to note is that the game never exceeded
 18%
  cpu usage!!  When I profiled 65% of the main thread was devoted to
 serial
  time and the bulk of the cpu time was on *this thread* and
  PrintSchedulingInfo [20] thread.  

Re: [osg-users] OSG thread profiling results are in!!

2008-06-28 Thread Robert Osfield
Hi Rick,

Sharing state is essential to good performance, and even more critical
when you start approaching memory limits.  You'll need to share
osg::Texture(s) rather than just osg::Image(s) to get the benefit.
Sharing complete osg::StateSet is the most efficient, for cull, draw
dispatch into the OpenGL fifo (the OSG's draw traversal) and draw down
on the GPU.

If you are hitting memory limits on the graphics card, beyond sharing
of Textures/StateSet, you could also look at using non power of two
textures, and using compressed texture formats as these can stay
compressed on the graphics card.  Scaling your texture sizes to fit to
your hardware limits.

My general guide would be to get your app running at a solid frame
rate (equal to your monitors refresh rate), typically this will be
something like 75Hz on modern displays, to hit this you might need to
be more conservative about just how much eye candy you are throwing at
the system i.e. texture sizes/effects etc.  Once you've got your solid
frame rate on a given hardware then look at what you can add without
breaking frame.  These days I see little excuse for not hitting a
solid 60+Hz for modern graphics apps, unless you have an app doing
something hard for the graphics hardware like volume rendering, or a
CAD app with millions of polygons in the scene.  If you aren't hitting
a solid frame rate then something's up and you need to address it.

There is *huge* number of things you can do to make graphics go more
efficiently, one can scratch the surface it in a couple of emails.

Robert.

On Sat, Jun 28, 2008 at 4:05 PM,  [EMAIL PROTECTED] wrote:
 Thanks Robert,

 (James and I are working on this together.  He has decided to focus his
 attentions on understanding OpenSceneGraph and optimizing where he can,
 where most of the client game code dealing with the scene graph is mine). I
 was planning on asking some of these questions eventually anyway, but now
 seems like a good time ;)

 I think a huge part of what we are seeing is that we are relying very
 heavily on image maps for everything, and that most graphics cards have to
 swap memory.  That said, I know that I need to work on my dependence on
 these image maps and look harder at resizing them where possible.  Are there
 any other recommendations out there for more effective image map
 utilization?  One thing I know I want to apply is LOD, which I have not done
 yet.  In looking over the example code, the LOD is pretty straight forward.

 All our ships use UV mapping, and I was trying to make it so that the image
 was only loaded once for the ship type, rather than for each instance.  It
 did not seem that this made much of a difference, however.  I have started
 to look at the osgImposter example for help in how I might better handle
 this.  AM I going in the right direction?

 I have lots more questions, but I figure I will ask them as I get to them
 and I am able to dig in myself.
 Thanks again for all the great support,
 -- Rick

 On Sat, Jun 28, 2008 at 5:22 AM, Robert Osfield [EMAIL PROTECTED]
 wrote:

 Hi James,

 I've read your emails but I'm afraid the stats mean absolutely nothing
 do me.  One will really need to be find out what parts of the OSG i.e.
 what function calls are the current bottleneck.

 As general note, performance optimization with scene graphs as almost
 always an issue of improving the balance of the scene graph, be it
 update, cull, draw dispatch or draw GPU, it's almost always a poor
 scene graph that is at fault.  You can often improve performance by
 10x and more by simply fixing the scene graph.  Doing low level code
 optimization will rarely get you anything like the performance
 improvement that you'd get by just fixing the scene graph.

 Given this, diving into low level profiling could well be a case of
 not seeing the wood from the trees.   So I'd recommend if you want
 your app to go faster start with the basics, are your CPU or GPU
 limited.  Then are you update, cull or draw dispatch limited?  Then
 depending upon what results you get consider why the scene graph
 itself is making things so slow.  This process will typically lead you
 to things you can do to your scene graph to fix the performance
 bottleneck, and all this without touching the actual code.
 Performance optimization is huge topic, but hopefully I'll have given
 you a little pointer to priorities I'd apply.

 Robert.

 On Sat, Jun 28, 2008 at 4:12 AM, James Killian
 [EMAIL PROTECTED] wrote:
 
  Here are some interesting profile results from the threaded profiler.
  First here is the ground work:
  OSG SVN 8482 using VS 7.1 with threading enabled (interlocked config).
   The
  actual client code tested that pushes some stress on osg is our game
  which
  anyone can download here http://www.fringe-online.com/.So I run this
  and
  measure the thread performance using Intel's thread compiler.  So far,
  our
  client code main loop is very similar to how it is in the osg viewer (no
  fancy 

Re: [osg-users] OSG thread profiling results are in!!

2008-06-28 Thread James Killian



Thanks for the feedback.  I'll start with these areas as you suggested and 
see what numbers I am getting.


The thread profiler does provide detailed information of every threaded 
activity at any given time.  I just wish there was some way to present the 
information given that would be more meaningful to the group.


What would be great is to have a big balanced scene that can put OSG Viewer 
to the test in a way where it puts equal intense stress on update, culling, 
and draw dispatch.  What I'd hope to see is the draw dispatch be on a 
separate thread, where that thread showed mostly I/O activity, and the cpu 
activity on other threads.


Well anyhow... I got my work cut out for me now... thanks again.


James Killian
- Original Message - 
From: Robert Osfield [EMAIL PROTECTED]

To: OpenSceneGraph Users osg-users@lists.openscenegraph.org
Sent: Saturday, June 28, 2008 4:22 AM
Subject: Re: [osg-users] OSG thread profiling results are in!!



Hi James,

I've read your emails but I'm afraid the stats mean absolutely nothing
do me.  One will really need to be find out what parts of the OSG i.e.
what function calls are the current bottleneck.

As general note, performance optimization with scene graphs as almost
always an issue of improving the balance of the scene graph, be it
update, cull, draw dispatch or draw GPU, it's almost always a poor
scene graph that is at fault.  You can often improve performance by
10x and more by simply fixing the scene graph.  Doing low level code
optimization will rarely get you anything like the performance
improvement that you'd get by just fixing the scene graph.

Given this, diving into low level profiling could well be a case of
not seeing the wood from the trees.   So I'd recommend if you want
your app to go faster start with the basics, are your CPU or GPU
limited.  Then are you update, cull or draw dispatch limited?  Then
depending upon what results you get consider why the scene graph
itself is making things so slow.  This process will typically lead you
to things you can do to your scene graph to fix the performance
bottleneck, and all this without touching the actual code.
Performance optimization is huge topic, but hopefully I'll have given
you a little pointer to priorities I'd apply.

Robert.

On Sat, Jun 28, 2008 at 4:12 AM, James Killian
[EMAIL PROTECTED] wrote:


Here are some interesting profile results from the threaded profiler.
First here is the ground work:
OSG SVN 8482 using VS 7.1 with threading enabled (interlocked config). 
The
actual client code tested that pushes some stress on osg is our game 
which
anyone can download here http://www.fringe-online.com/.So I run this 
and
measure the thread performance using Intel's thread compiler.  So far, 
our

client code main loop is very similar to how it is in the osg viewer (no
fancy optimizations).

There are 2 machines I have tested now... I'll post a copy of a different
message I sent a few days ago here (to keep all info in this thread)

---snip
Robert,
This proposal you mention for 2.6 will it help balance the cpu workload
against the  gpu I/O bottleneck?

I've been doing some osg performance benchmark research on thread
synchronization using the Intel Threaded compiler, and so far the results
are looking really good except for a 26% over-utilization due to 
sleeping. I
do want to say awesome job to those responsible for threading, the amount 
of
critical section use looked very good!  All the worker threads also had 
good

profiling results.

The ultimate test I want to try today deals with an intentional GPU
bottleneck... where I have a quadcore that pipes graphics out a PCI 
graphics

card.  If anyone is interested I'll post these test results.  I know now
that using a quad core there is lack of parallelization (e.g. 25% 85% 15%
15%), but that is a different battle for a different time.

I do want to get to the bottom of the profiling and determine how well 
the

workload is balanced against the gpu i/o, and see if there is some
opportunity for optimization here.
-snip

Today I have the numbers from the souped up machine with a poor poor pci
graphics card.  The first thing to note is that the game never exceeded 
18%
cpu usage!!  When I profiled 65% of the main thread was devoted to 
serial

time and the bulk of the cpu time was on *this thread* and
PrintSchedulingInfo [20] thread.  The thread 20 showed 21% contributed to
blocking, but the rest of it was active.  The rest of the threads (like 
with
my machine) looked really good! it is just too bad they don't do much 
work.



Realistically my machine at work is not typical due to the pci graphics, 
but
it did put good stresses to show where the I/O bottle neck is (on the 
main

thread).  My machine at home is a dual p 2.4 with NVidia GeForce 5900XT.
When testing other games on my home machine I get great frame rate, so my
goal will be to osg's performance to something

Re: [osg-users] OSG thread profiling results are in!!

2008-06-28 Thread Robert Osfield
On Sat, Jun 28, 2008 at 4:35 PM, James Killian
[EMAIL PROTECTED] wrote:
 The thread profiler does provide detailed information of every threaded
 activity at any given time.  I just wish there was some way to present the
 information given that would be more meaningful to the group.

 What would be great is to have a big balanced scene that can put OSG Viewer
 to the test in a way where it puts equal intense stress on update, culling,
 and draw dispatch.  What I'd hope to see is the draw dispatch be on a
 separate thread, where that thread showed mostly I/O activity, and the cpu
 activity on other threads.

The osgViewer::StatsHandler will display update, event, cull, draw
dispatch on all systems and draw GPU stats.  The GPU stats require an
OpenGL extension that I've only seen Nvidia implement so far, so you
won't see this stats printed out on all systems.

Also record a camera path/game sequence that you can use for
benchmarking so that every run the app does the same thing, then
you'll be able to study the effects that changes you make have on
final performance.  You'll also be able to study the above stats to
where the problems occur in your scene.

As a small note, the OSG in CullDrawThreadPerContext,
DrawThreadPerContext and CullThreadPerCameraDrawThreadPerContext run
graphics in a separate thread.

Robert.
___
osg-users mailing list
osg-users@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org