Re: [osg-users] CPU Performance issues with AMD 2700 vs Intel Corei7 4770S

2019-04-03 Thread Robert Osfield
Hi François,

On Wed, 3 Apr 2019 at 13:55, François Cami  wrote:
> Just to set expectations there:
> * there is no reclocking support for GPUs newer than Maxwell v2 in Nouveau
> * Turing support in Nouveau requires fairly recent builds (early 2019)
> * there is no acceleration, only llvmpipe.
>
> As all GTX 9xx and newer always run at their slowest clock, your best
> bet is always the NVIDIA drivers with these.

Thanks for info.  We can look to NVidia for lack lustre open
source/community support.

A bit of sad state of affairs.

> On the AMD GPU side of things the opensource stack is in much better shape.

I will need to get an AMD graphics card to add to my testing mix as
things have come a long way since I last worked with AMD cards.

I'll be working on Vulkan RTX support this year so need the 2060 for
this, so will stick with what I have for now.

Cheers,
Robert.
___
osg-users mailing list
osg-users@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org


Re: [osg-users] CPU Performance issues with AMD 2700 vs Intel Corei7 4770S

2019-04-03 Thread François Cami
On Wed, Apr 3, 2019 at 2:12 PM Robert Osfield  wrote:
>
> Hi All,
>
> I've now installed 18.04 on my new AMD2700+Geforce2060 system and did
> a range of more tests and learnt a few things along the way.
>
> First up I tried out the open source graphics drivers that come with
> 18.04 and they do a really poor job at supporting the 2060, screen
> resolution pegged at 1024x768 and while the OSG compiled and ran just
> fine for my small city test model I only get 39fps on it.  I couldn't
> work out how to get the vulkan drivers working so didn't do any tests.

Just to set expectations there:
* there is no reclocking support for GPUs newer than Maxwell v2 in Nouveau
* Turing support in Nouveau requires fairly recent builds (early 2019)
* there is no acceleration, only llvmpipe.

As all GTX 9xx and newer always run at their slowest clock, your best
bet is always the NVIDIA drivers with these.

> Second I installed the NVidia drivers, Ubuntu/Kubuntu now requires a
> few more steps than it used a few years back, seems like they have a
> strong preference for the open source drivers, but as performance and
> support for modern cards really sucks I don't feel this is a great
> move.

On the AMD GPU side of things the opensource stack is in much better shape.

>  Once I installed the NVidia drivers I frame rate for the small
> city scene and standard path jumped to 368fps at 1920x1020 so way more
> than an order of magnitude better, also got my dual monitors work fine
> too.
>
> While exploring the different options in the GUI for the displays I
> came across the toggle for switching off the compositor.  This used to
> be alongside the desktop effects settings GUI, but now moved to the
> display settings.  Switching off the compositor suddenly let the hand
> brake off and my new system started pushing frame rates higher than my
> older Intel+Gefore1060 system.  Curiously the old system had
> compositor switched on and didn't see the same capping of framerates
> with the VSG/Vulkan.  I don't know whether why this is happening as
> they now both have 18.04 installed, perhaps it's hardware, perhaps
> it's the later NVidia drivers, I'll look to upgrading the NVIdia
> drives on the old system next.   Switching off the compositor on the
> Intel system helps the max performance as well, but only 25% rather
> than 200% like I saw on the new system.
>
> Now that I've switched off the compositor on the AMD2700+Geforce 2060
> system I'm seeing more predictable results between the two systems and
> see patterns emerge.
>
> Intel Core i7 4770S   AMD Ryzen7 2700
>Geforce 1060   Geforce 2060
> OSG@1920x1080 484fps  369fps  (28% slower)
> VSG@1920x1080 2168fps 2697fps (23% faster)
> VSG@192x108 2712fps 2842fps (4% faster)
>
> So here we finally see the Geforce 2060 stretch it legs and beat the
> 1060 thanks to it's better fill rate.
>
> The OSG's slow performance on the AMD chip though more than
> overwhelmed the results at is significantly slower. For users that
> rely on OSG applications and considering whether to go for an Intel vs
> AMD or investing in a new GPU, the Intel is going to be the far more
> critical change.
>
> The results show how different approaches I've used in the VSG for
> reducing node size and the complexity of traversal, along with Vulkan
> there just isn't the same AMD penalty that we see with the OSG,
> instead we see the scaling we should expect with upgrading the
> graphics hardware.
>
> The difference isn't just down to OpenGL vs Vulkan with the difference
> between Intel and AMD, in developing the VSG I wrote two test
> programs, osggroups and vsggroups, that both create a quad tree graph
> (11 levels deep by default) and traversers it 10x and then destructs
> it.  Here we can see like for like on pure CPU scene graph operations.
>
> Intel Core i7 4770S   AMD Ryzen7 2700
> osgroups 3.77 secs  4.91  (30% slower)
> vsggroups   0.55secs   0.55secs - almost 
> identical!
>
> The results with osggroup CPU test mirrors the speed difference in the
> osgviewer test with the small city model I've been using, so this
> indicates that it's not just down to differences in OpenGL vs Vulkan
> that we see differences in performance.
>
> The vsg results being nearly identical doesn't quite tell the full
> story. I've run more VSG related tests and find that double dispatch
> visitor vs single dispatch visitor and find that
> the Intel chip sees more penalty with double dispatch than the AMD
> chip.  The AMD tests though show that the destruction of the scene
> graph is higher than the Intel chip.  Things tend to balance out for
> the vsggroups test though, it's more fluke than any important.  The
> key take away is that when you use the CPU's more efficiently like the
> VSG does 

Re: [osg-users] CPU Performance issues with AMD 2700 vs Intel Corei7 4770S

2019-04-03 Thread Robert Osfield
Hi Chirs et. al.

On Wed, 3 Apr 2019 at 13:08, Chris Hanson  wrote:
>   He's GPU side. I tech-reviewed for the OpenGL SuperBible that he was one of 
> the authors of.
>
>   He might still be able to advise us on troubleshooting if we can't pin it 
> down. He's been very helpful before.

Thanks for the info.  For now I've got the new system working well
enough that I can get on and use it as my main dev system.  With the
changes of the set up today I've got the VSG performing inline with my
expectations.

On the OSG side, ouch it's struggling far more than I expected.  I've
tried different CPU affinity tests using taskset and different
compilers and AMD zen compile options and haven't see the gap close
between the Intel and AMD.  From the CPU scene graph tests I've done
it really looks like the OSG heavy-weight node approach shows a far
bigger penalty on AMD vs Intel.  Credit to Intel for having a CPU
architecture that can handle such an awkward beast.  AMD clearly
haven't optimized the CPU architecture for software like the OSG.

As may "day job" is now pushing on with VSG and OSG is in maintenance
mode I am inclined to take note of this finding on the OSG side, but
to not spend further time trying to find improvements for the OSG on
AMD.

For users hitting up against CPU scene graph bottlenecks, especially
on AMD, then I would suggest that VSG will be the most effective way
to resolve them.  However, the VSG is still a long way from being
ready to be used in serious graphics application, it's under heavy
development and large API changes each month.  Another 6 months things
will become more viable on the VSG side as we close in on 1.0, it'll
still be quite a different API to the OSG so not a straight forward
drop in replacement.  At this point knowing that the VSG is going to
be available and will radically improve performance should make it
something to watch and plan for, especially for new projects.

Cheers,
Robert.
___
osg-users mailing list
osg-users@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org


Re: [osg-users] CPU Performance issues with AMD 2700 vs Intel Corei7 4770S

2019-04-03 Thread Robert Osfield
Hi All,

I've now installed 18.04 on my new AMD2700+Geforce2060 system and did
a range of more tests and learnt a few things along the way.

First up I tried out the open source graphics drivers that come with
18.04 and they do a really poor job at supporting the 2060, screen
resolution pegged at 1024x768 and while the OSG compiled and ran just
fine for my small city test model I only get 39fps on it.  I couldn't
work out how to get the vulkan drivers working so didn't do any tests.

Second I installed the NVidia drivers, Ubuntu/Kubuntu now requires a
few more steps than it used a few years back, seems like they have a
strong preference for the open source drivers, but as performance and
support for modern cards really sucks I don't feel this is a great
move.  Once I installed the NVidia drivers I frame rate for the small
city scene and standard path jumped to 368fps at 1920x1020 so way more
than an order of magnitude better, also got my dual monitors work fine
too.

While exploring the different options in the GUI for the displays I
came across the toggle for switching off the compositor.  This used to
be alongside the desktop effects settings GUI, but now moved to the
display settings.  Switching off the compositor suddenly let the hand
brake off and my new system started pushing frame rates higher than my
older Intel+Gefore1060 system.  Curiously the old system had
compositor switched on and didn't see the same capping of framerates
with the VSG/Vulkan.  I don't know whether why this is happening as
they now both have 18.04 installed, perhaps it's hardware, perhaps
it's the later NVidia drivers, I'll look to upgrading the NVIdia
drives on the old system next.   Switching off the compositor on the
Intel system helps the max performance as well, but only 25% rather
than 200% like I saw on the new system.

Now that I've switched off the compositor on the AMD2700+Geforce 2060
system I'm seeing more predictable results between the two systems and
see patterns emerge.

Intel Core i7 4770S   AMD Ryzen7 2700
   Geforce 1060   Geforce 2060
OSG@1920x1080 484fps  369fps  (28% slower)
VSG@1920x1080 2168fps 2697fps (23% faster)
VSG@192x108 2712fps 2842fps (4% faster)

So here we finally see the Geforce 2060 stretch it legs and beat the
1060 thanks to it's better fill rate.

The OSG's slow performance on the AMD chip though more than
overwhelmed the results at is significantly slower. For users that
rely on OSG applications and considering whether to go for an Intel vs
AMD or investing in a new GPU, the Intel is going to be the far more
critical change.

The results show how different approaches I've used in the VSG for
reducing node size and the complexity of traversal, along with Vulkan
there just isn't the same AMD penalty that we see with the OSG,
instead we see the scaling we should expect with upgrading the
graphics hardware.

The difference isn't just down to OpenGL vs Vulkan with the difference
between Intel and AMD, in developing the VSG I wrote two test
programs, osggroups and vsggroups, that both create a quad tree graph
(11 levels deep by default) and traversers it 10x and then destructs
it.  Here we can see like for like on pure CPU scene graph operations.

Intel Core i7 4770S   AMD Ryzen7 2700
osgroups 3.77 secs  4.91  (30% slower)
vsggroups   0.55secs   0.55secs - almost identical!

The results with osggroup CPU test mirrors the speed difference in the
osgviewer test with the small city model I've been using, so this
indicates that it's not just down to differences in OpenGL vs Vulkan
that we see differences in performance.

The vsg results being nearly identical doesn't quite tell the full
story. I've run more VSG related tests and find that double dispatch
visitor vs single dispatch visitor and find that
the Intel chip sees more penalty with double dispatch than the AMD
chip.  The AMD tests though show that the destruction of the scene
graph is higher than the Intel chip.  Things tend to balance out for
the vsggroups test though, it's more fluke than any important.  The
key take away is that when you use the CPU's more efficiently like the
VSG does compared to OSG the two chips both perform in a over similar
way w.r.t work per cycles.

Are these efficiencies hat I've efficiency with the VSG possible with
the OSG?  Unfortunately not without breaking key features.
osg::Node's are significantly bigger than their vsg::Node counterparts
as the OSG nodes hold more optional data.  The OSG traversal also
checks more settings - like NodeMask or presence of optional StateSet
that can be stored with all Nodes.  The osg::NodeVisitor has more
different options that control it's behavior so adds more work on the
traversal through the scene graph.  All these extra checks and memory

Re: [osg-users] CPU Performance issues with AMD 2700 vs Intel Corei7 4770S

2019-04-03 Thread Chris Hanson
>
>
> In this case I have a AMD CPU with a NVidia graphics card so a bit of
> different set of things that could cause issues.
>
> > Do you know Graham Sellers at AMD? Maybe we could get some advising from
> him.
>
> I don't know Graham, is he on the graphics or CPU side?
>


  He's GPU side. I tech-reviewed for the OpenGL SuperBible that he was one
of the authors of.

  He might still be able to advise us on troubleshooting if we can't pin it
down. He's been very helpful before.

-- 
Chris 'Xenon' Hanson, omo sanza lettere. xe...@alphapixel.com
http://www.alphapixel.com/
Training • Consulting • Contracting
3D • Scene Graphs (Open Scene Graph/OSG) • OpenGL 2 • OpenGL 3 • OpenGL 4 •
GLSL • OpenGL ES 1 • OpenGL ES 2 • OpenCL
Legal/IP • Forensics • Imaging • UAVs • GIS • GPS •
osgEarth • Terrain • Telemetry • Cryptography • LIDAR • Embedded • Mobile •
iPhone/iPad/iOS • Android
@alphapixel  facebook.com/alphapixel (775)
623-PIXL [7495]
___
osg-users mailing list
osg-users@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org


Re: [osg-users] CPU Performance issues with AMD 2700 vs Intel Corei7 4770S

2019-04-02 Thread François Cami
On Tue, Apr 2, 2019 at 4:33 PM Robert Osfield  wrote:
>
> Hi François
>
> Thanks for the suggestion and links.  I had been wondering about there
> was an issue with thread affinity and your links illustrates how this
> can be even worse than normal with the two group of cores on the
> Ryzen.
>
> I have followed your suggestion of using taskset to set the affinity
> to -c 0-7 and various other combinations but don't see any consistent
> difference to be able to declare that it's having a positive affect -
> the variation in frame rates between different runs is wider than any
> differences using taskset has.  I've been trying runs of the same
> software (osgviewer & osg2vsg viewer) and with the same dataset on my
> Intel and AMD boxes with different taskset and if anything the Intel
> benefits more, and the results are perhaps slightly more consistent.
> Differences are in the 1-2% range, but each run I can see as much
> variation as this so I'd say the results are statistically
> significant.
>
> The tests I've been doing just now use an old city model that I have,
> it's not a large model and doesn't use any advanced features being
> derived originally from a Creator/OpenFlight model.  Total number of
> Geometries is 2268, number of Group nodes 1518, Transforms 552,
> Vertices and Primtives 161,061 and 46,396 respectively and finally
> 1105 StateSet's.  When using the same animation path file, and
> rendering at 192x108 (1/100th pixels of my display to avoid fill
> limit) this small city model renders at:
>
>   OSG  OSG(vfs)VSG
> core-i7-4770s 672fps432fps   2845fps
> AMD 2700   547fps   347fps2320fps
>
> The OSG(vsg) is running osgviewer with just view frustum sides enabled
> so that small feature culling is not enabled.  This makes quite a bit
> of difference to the OSG performance on this model, so switching it
> off rather hobbles the OSG for this dataset and animation path but as
> the VSG doesn't have small feature culling or LOD it's a bit of better
> comparison like for like with the VSG.  The OSG in these case are
> running with DrawThreadPerContext while the VSG is running single
> threaded.
>
> The OSG with small feature culling results see a 22% slow down on the AMD 
> 2700.
> The OSG with small feature culling off see a 26% slow down on AMD 2700
> The VSG sees a 22% slow down on the AMD2700
>
> This is all despite having a Geforce 2060 in the AMD box vs a 1060 in
> the Intel one, so it's looks strongly like a CPU related issue and a
> pretty consistent once given we have two totally different scene
> graphs exhibiting a similar slow down.
>
> For those curious about how much multi-threaded helps the OSG.
> Running Single threaded the results I get are:
>
>   OSG  OSG(vfs)VSG
> core-i7-4770s 483fps322fps   2845fps
> AMD 2700   399fps   256fps2320fps
>
> So DrawThreadPerContext vs SingleThread with small feature culling is
> 39% faster on the Intel, 37% faster on the AMD.
> While DrawThreadPerContext vs SingleThread without small feature
> culling is 34% faster on the Intel, 35% faster on the AMD.
>
> The figures for the VSG are the same in the above tests as I haven't
> yet tackled multi-threading.  The VSG also doesn't yet do depth
> sorting of the transparent objects or do billboarding so it's not
> quite a 1:1 match up for visuals, these features will slow the VSG by
> some %, while multi-threading will get us back some.  The VSG hasn't
> been optimized either so it's too early to make any conclusion from
> the figures beyond comparing Intel vs AMD and that the VSG with Vulkan
> is going to be significantly faster than the OSG/OpenGL for models
> that have lots of separate geometries and state.
>
> Another oddity is that in the osg2vsg test app I've added a test of
> not fulling in the command buffer each frame, instead just
> resubmitting the same command buffer each frame, when I do this I
> remove the VSG's command dispatch traversal.  It doens't update the
> eye point so it's a pretty crappy test in terms of something that is
> widely useful in real applications but it can provide some insight in
> the CPU overhead associated with the scene graph traversal and filling
> the command buffer.  On the Intel system I see frame rate jump from
> 2800fps to 6700fps, while on the AMD system frame rate stays around
> 2200-2300fps point and if anything it actually averages slightly lower
> fps.  This is a really odd result on the AMD system and suggest that
> perhaps the driver/OS is doing something odd as I'm running exactly
> the VSG version and data on both systems.
>
> At this point I'd love to get some better performance parity between
> Intel and AMD as a ~25% penalty is far larger than I was expecting.
> Yesterday I tried out clang-7 on my AMD box and didn't spot any
> notable differences with gcc 8.2.  Now taskset suggests that thread
> migration between cores is 

Re: [osg-users] CPU Performance issues with AMD 2700 vs Intel Corei7 4770S

2019-04-02 Thread Robert Osfield
Hi François

Thanks for the suggestion and links.  I had been wondering about there
was an issue with thread affinity and your links illustrates how this
can be even worse than normal with the two group of cores on the
Ryzen.

I have followed your suggestion of using taskset to set the affinity
to -c 0-7 and various other combinations but don't see any consistent
difference to be able to declare that it's having a positive affect -
the variation in frame rates between different runs is wider than any
differences using taskset has.  I've been trying runs of the same
software (osgviewer & osg2vsg viewer) and with the same dataset on my
Intel and AMD boxes with different taskset and if anything the Intel
benefits more, and the results are perhaps slightly more consistent.
Differences are in the 1-2% range, but each run I can see as much
variation as this so I'd say the results are statistically
significant.

The tests I've been doing just now use an old city model that I have,
it's not a large model and doesn't use any advanced features being
derived originally from a Creator/OpenFlight model.  Total number of
Geometries is 2268, number of Group nodes 1518, Transforms 552,
Vertices and Primtives 161,061 and 46,396 respectively and finally
1105 StateSet's.  When using the same animation path file, and
rendering at 192x108 (1/100th pixels of my display to avoid fill
limit) this small city model renders at:

  OSG  OSG(vfs)VSG
core-i7-4770s 672fps432fps   2845fps
AMD 2700   547fps   347fps2320fps

The OSG(vsg) is running osgviewer with just view frustum sides enabled
so that small feature culling is not enabled.  This makes quite a bit
of difference to the OSG performance on this model, so switching it
off rather hobbles the OSG for this dataset and animation path but as
the VSG doesn't have small feature culling or LOD it's a bit of better
comparison like for like with the VSG.  The OSG in these case are
running with DrawThreadPerContext while the VSG is running single
threaded.

The OSG with small feature culling results see a 22% slow down on the AMD 2700.
The OSG with small feature culling off see a 26% slow down on AMD 2700
The VSG sees a 22% slow down on the AMD2700

This is all despite having a Geforce 2060 in the AMD box vs a 1060 in
the Intel one, so it's looks strongly like a CPU related issue and a
pretty consistent once given we have two totally different scene
graphs exhibiting a similar slow down.

For those curious about how much multi-threaded helps the OSG.
Running Single threaded the results I get are:

  OSG  OSG(vfs)VSG
core-i7-4770s 483fps322fps   2845fps
AMD 2700   399fps   256fps2320fps

So DrawThreadPerContext vs SingleThread with small feature culling is
39% faster on the Intel, 37% faster on the AMD.
While DrawThreadPerContext vs SingleThread without small feature
culling is 34% faster on the Intel, 35% faster on the AMD.

The figures for the VSG are the same in the above tests as I haven't
yet tackled multi-threading.  The VSG also doesn't yet do depth
sorting of the transparent objects or do billboarding so it's not
quite a 1:1 match up for visuals, these features will slow the VSG by
some %, while multi-threading will get us back some.  The VSG hasn't
been optimized either so it's too early to make any conclusion from
the figures beyond comparing Intel vs AMD and that the VSG with Vulkan
is going to be significantly faster than the OSG/OpenGL for models
that have lots of separate geometries and state.

Another oddity is that in the osg2vsg test app I've added a test of
not fulling in the command buffer each frame, instead just
resubmitting the same command buffer each frame, when I do this I
remove the VSG's command dispatch traversal.  It doens't update the
eye point so it's a pretty crappy test in terms of something that is
widely useful in real applications but it can provide some insight in
the CPU overhead associated with the scene graph traversal and filling
the command buffer.  On the Intel system I see frame rate jump from
2800fps to 6700fps, while on the AMD system frame rate stays around
2200-2300fps point and if anything it actually averages slightly lower
fps.  This is a really odd result on the AMD system and suggest that
perhaps the driver/OS is doing something odd as I'm running exactly
the VSG version and data on both systems.

At this point I'd love to get some better performance parity between
Intel and AMD as a ~25% penalty is far larger than I was expecting.
Yesterday I tried out clang-7 on my AMD box and didn't spot any
notable differences with gcc 8.2.  Now taskset suggests that thread
migration between cores is not a significant issue for the these
particular scene graph tests under Linux.

Where to look next?  Happy to take suggestions.

I'm now going to install Kubuntu 18.04 on the AMD system to see if
18.10 and the later NVidia 

Re: [osg-users] CPU Performance issues with AMD 2700 vs Intel Corei7 4770S

2019-04-02 Thread François Cami
Hi,

On Tue, Apr 2, 2019 at 8:52 AM Robert Osfield  wrote:
>
> Hi Chris,
>
> On Mon, 1 Apr 2019 at 21:41, Chris Hanson  wrote:
> > I don't have any modern Apples to Apples comparisons to cite.
> >
> > I do know, that if you recall, we ran into threading issues when dealing 
> > with the massive plenoptic display IG back in 2015 or so on AMD cards, and 
> > it needed special assistance from AMD to unclog it.
>
> In this case I have a AMD CPU with a NVidia graphics card so a bit of
> different set of things that could cause issues.
>
> One test I did yesterday afternoon was to try clang-7 on my new AMD
> system but didn't see any performance notable difference.

Could it be the inter-CCX latency specific to Ryzen?
https://community.amd.com/thread/215519

In this case you might want to use taskset. The following should limit
the command to the first CCX if HT is enabled:
taskset -c 0-7 

Cheers
François

> > Do you know Graham Sellers at AMD? Maybe we could get some advising from 
> > him.
>
> I don't know Graham, is he on the graphics or CPU side?
>
> Cheers,
> Robert.
> ___
> osg-users mailing list
> osg-users@lists.openscenegraph.org
> http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
___
osg-users mailing list
osg-users@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org


Re: [osg-users] CPU Performance issues with AMD 2700 vs Intel Corei7 4770S

2019-04-02 Thread Robert Osfield
Hi Chris,

On Mon, 1 Apr 2019 at 21:41, Chris Hanson  wrote:
> I don't have any modern Apples to Apples comparisons to cite.
>
> I do know, that if you recall, we ran into threading issues when dealing with 
> the massive plenoptic display IG back in 2015 or so on AMD cards, and it 
> needed special assistance from AMD to unclog it.

In this case I have a AMD CPU with a NVidia graphics card so a bit of
different set of things that could cause issues.

One test I did yesterday afternoon was to try clang-7 on my new AMD
system but didn't see any performance notable difference.

> Do you know Graham Sellers at AMD? Maybe we could get some advising from him.

I don't know Graham, is he on the graphics or CPU side?

Cheers,
Robert.
___
osg-users mailing list
osg-users@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org


Re: [osg-users] CPU Performance issues with AMD 2700 vs Intel Corei7 4770S

2019-04-01 Thread Chris Hanson
I don't have any modern Apples to Apples comparisons to cite.

I do know, that if you recall, we ran into threading issues when dealing
with the massive plenoptic display IG back in 2015 or so on AMD cards, and
it needed special assistance from AMD to unclog it.

Do you know Graham Sellers at AMD? Maybe we could get some advising from
him.



On Mon, Apr 1, 2019 at 2:59 PM Robert Osfield 
wrote:

> Hi All,
>
> I have recently got a new dev machine, a AMD2700 + Geforce 2060
> system. I wanted a quiet mulit-core system as well as to get some
> experiment working with AMD chips as I've been using Intel Corei7
> chips for the last 8 years.  The machine uses passive cooling for
> everything except the graphics card so it's soo quiet you hardly know
> it's on :-)
>
> I've begun performance testing and was expecting most improvements
> across the board for single threaded tasks, and big improvement on
> highly threaded tasks like building.   I expected this based on a
> number of online performance comparisons.  For example:
>
>
> https://cpu.userbenchmark.com/Compare/Intel-Core-i7-4770S-vs-AMD-Ryzen-7-2700/m2123vs3957
>
> http://hwbench.com/cpus/amd-ryzen-7-2700-vs-intel-core-i7-4770s
>
> When using the OSG I'm seeing lower framerates for almost all the test
> models I regularly use, in particular the cull traversal can be 20-40%
> slower.
>
> Doing synthetic OSG + VSG scene graph creation, traversal tests shows
> the OSG 32% slower on the AMD (running osggroups from the osg2vsg
> project) while the equivalent VSG test  is ~14% faster on the AMD
> (running vsgvisitor from the vsgExamples project).
>
> Looking at the Linux perf stat utility it looks like the OSG runs for
> AMD run are hitting up against 498 context-switches for the run of
> osggroups, while the Intel just sees 24 context switches.  Branch
> misses are 0.66% on AMD vs 0.37% on Intel.  Instructions per cycle are
> all worse on the AMD too, 0.72 (AMD) vs 0.95 (Intel)
> instrictions/cycle
>
> The same equivalent tests with the VSG shows that the AMD has 56
> context switches vs 2 on the Intel.  Branch misses are also still
> higher on the AMD, but the instructions per cycle are now 2.13 (AMD)
> vs 1.96 (Intel) instructions/cycle.
>
> Another way of presenting this data, this OSG vs VSG tests runs a
> testing of creation 11 deep quad tree, traversing every node in the
> graph 10 times, then destroying the graph:
>
> VSG Intel 0.615 secs  (VSG 6.33 x faster than OSG)
> OSG Intel 3.895 secs
>
> VSG AMD 0.555 secs (VSG 9.36 x faster than OSG)
> OSG AMD 5.196 secs
>
>
> Despite the raw creation, traversals and destruction being quicker on
> the AMD with the VSG, the VSG framerates on scenes are worse on the
> AMD vs Intel though, except where the 2060's better fill rate gives it
> advantage.  Even when I take the VSG's traversals out of the equation
> and just submitted the same command graph to render each frame the
> frame rate is worse on the AMD vs Intel system, suggesting it's not
> just the slower traversals that is at issue - the Vulkan driver and
> hardware look to be struggling as well.
>
> The two systems aren't identical software wise, the Intel system in on
> Kubuntu 18.04 with gcc 7.3.0 vs Kubuntu 18.10 with gcc 8.2.0.  The
> Intel system has NVidia 396.54.09 vs AMD system with NVidia 415.27.
>
> I wouldn't have thought the OS, compile and drivers to make such stark
> differences in performance.  The VSG's lighterweight design does seem
> to help reduce the impact a bit and for some tests gives the AMD an
> advantage, but it's still overall slower on the AMD.  The OSG sees a
> huge hit on complex scenes on the AMD system.
>
> So I'm perplexed what might be going on.  The perf stat report
> suggests that the AMD is running at around 3.9-4GHz vs Intel running
> at 3.8 to 3.GHz.  The perf stat results suggests that certain
> workloads are really messing up the AMD's chip ability to efficiently
> handle the load.
>
> On one upside, the AMD's 8 cores/16 threads vs Intel's 4cores/8
> threads certainly helps when compiling the OSG, but this is small
> comfort as my job is far more focused on optimizing real-time
> rendering performance than just getting code compiled.
>
> I'd really like some feedback from the community as to what might be
> going wrong on my AMD system.  Are others seeing significantly slower
> CPU performance on AMD vs Intel for certain tasks?
>
> Any suggestions as what tests to do next?  What to tweak?
>
> Cheers,
> Robert.
>
> I have
> ___
> osg-users mailing list
> osg-users@lists.openscenegraph.org
> http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
>


-- 
Chris 'Xenon' Hanson, omo sanza lettere. xe...@alphapixel.com
http://www.alphapixel.com/
Training • Consulting • Contracting
3D • Scene Graphs (Open Scene Graph/OSG) • OpenGL 2 • OpenGL 3 • OpenGL 4 •
GLSL • OpenGL ES 1 • OpenGL ES 2 • OpenCL
Legal/IP • Forensics • Imaging • UAVs • GIS • GPS •