The findings from my own benchmarking are basically that the problem is not 
JS performance, but WebGL call overhead, thus it is extremely important to 
reduce the number of calls into WebGL.

Most surprising to me is that a WebGL application on Windows can easily 
beat a native desktop OSX application, because the OSX OpenGL driver sucks 
so badly (at least in my MBP with an Intel HD 4000 driver). But staying on 
Windows, a native desktop application can easily have 10x more draw call 
throughput then a WebGL app running on the same machine, BUT not because of 
slow JS performance, but because of WebGL overhead.

I have 3 test scenarios, all numbers are roughly "number of instances drawn 
per frame until frame rate drops below 60fps":

- naive drawing with unique draw calls (1 uniform update, 1 draw call, no 
other state changes inbetween), object positions are computed on CPU. Best 
case here is about 70k draws on Windows native OpenGL with NVIDIA, on OSX 
it starts to drop below 60fps at around 12k instances, and with WebGL it's 
between 5k and 6k draws (browser, platform or CPU doesn't matter in this 
case)

- next I tested drawing with ANGLE_instanced_arrays, object positions are 
computed on CPU, written to a (double-buffered) dynamic vertex buffer, and 
then rendered with a single draw call, in Chrome on Windows with NVIDIA I 
can get 450k instances before the performance drops below 60fps (so 450k 
particle position updates per frame in JS, and no sweat!), performance in a 
native app isn't better here, my suspicion is that the vertex buffer update 
is the limiter here (500k instances means 8MByte of dynamic vertex data 
shuffled to the GPU each frame), on my OSX MBP I can go up to about 180k 
instances (again very likely vertex throughput limited). However in this 
case, the way the dynamic vertex buffer works is also important, it looks 
like vertex buffer orphaning is useless in WebGL (see discussion 
here: https://groups.google.com/forum/#!topic/webgl-dev-list/vMNXSNRAg8M), 
so I switched to double-buffering

- finally I tried to do everything on the GPU with 2 passes, first evaluate 
particle positions in a fullscreen-quad fragment shader, and then use 
vertex shader texture fetch to place the particles, also using instanced 
rendering, this goes up to about 800k instances on my Windows/NVIDIA 
machine in Chrome, but doesn't improve on my OSX machine (I guess the 
problem there is rendering to a RGBA32F render target which might overload 
an Intel HD4000 a bit).

Also, the PNaCl versions of the demos have about the same limitations as 
the WebGL version, pointing to the browser's GL wrapper as the bottleneck.

Here are the demo links: 
- naive drawing with unique draw 
calls: http://floooh.github.io/oryol/DrawCallPerf.html
- hardware instanced rendering, CPU position 
updates: http://floooh.github.io/oryol/Instancing.html
- fully GPU rendered: http://floooh.github.io/oryol/GPUParticles.html

So in conclusion:
- JS performance is perfectly fine, both in Chrome and FF (even IE11 and 
the latest Safari)
- WebGL is the bottleneck, try to minimize calling into WebGL as much as 
possible
- OSX OpenGL sucks ass, especially with an Intel GPU

There's also a very handy benchmark table here from the bgfx engine which 
nicely shows what draw call performance to expect on various platforms:

https://github.com/bkaradzic/bgfx#17-drawstress

Cheers,
-Floh.

Am Montag, 18. August 2014 17:11:34 UTC+2 schrieb Jean-Marc Le Roux:
>
> Hi there!
>
> As part of the next beta of Minko, our x-platform, free and open source 3D 
> engine, we're heavily working on some major performance improvement.
> I took some time to write a proto 3D engine to measure OpenGL "raw 
> performance". My goal was to:
> - see how to get the same results with an actual high level 3D engine;
> - see how WebGL performs compared to OpenGL thanks to Emscripten.
>
> TL;DR try it for yourself (results in the dev console):
>
> http://minko.io/wp-content/uploads/2014/08/minko-example-cube.html
>
> more details on the forum:
>
> http://minko.io/forums/topic/performance-improvement/
>
> Now the asm.js version is a lot slower than the native one: more than 10 
> times slower.
> It doesn't really fit with Emscripten's usual figures that are between 2 
> and 5 times slower than native.
>
> Is there anything we're doing wrong?
> Is WebGL the bottleneck?
>
> Anyway, feedback appreciated :)
>

-- 
You received this message because you are subscribed to the Google Groups 
"emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to emscripten-discuss+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to