Re: Doom 3 port / Asyncify test results

Floh Mon, 22 Jul 2019 09:58:54 -0700

I see, makes sense! Pretty much the same problem as virtual methods / jump 
tables which disable dead-code-elimination I guess.


On Monday, 22 July 2019 18:45:22 UTC+2, Alon Zakai wrote:
>
> Floh: the analysis works well in many cases, but the main problem is 
> indirect calls - in a big enough codebase with enough of those, any 
> indirect call looks like it can lead to something that sleeps :( We may 
> indeed need a whitelist approach, some thinking is happening here: 
> https://github.com/WebAssembly/binaryen/issues/2218
>
> On Mon, Jul 22, 2019 at 4:36 AM Floh <[email protected] <javascript:>> 
> wrote:
>
>> Many thanks for the detailed breakdown :)
>>
>> Is this only using asyncify for an 'infinite game loop' (instead of a 
>> frame callback), or is this also used for other synchronous calls like 
>> file- and network-I/O? I'm trying to understand the reason behind the 50% 
>> size increase. As far as I understand Alon's recent blog post on the topic, 
>> there's a control-flow analysis happening, so only functions along 
>> call-stacks which need 'asyncification' would need to be instrumented, but 
>> that might just be my overly optimistic interpretation ;) (so for instance, 
>> if only the main loop would need to be asyncified, the size increase should 
>> be very small, but if there are synchronous IO calls all over the place, 
>> much more code would need to be instrumented, adding to the code size).
>>
>> Cheers!
>> -Floh.
>>
>> On Monday, 22 July 2019 08:40:13 UTC+2, Gabriel CV wrote:
>>>
>>> Hi!
>>>
>>> I did some tests with the new Upstream/Asyncify feature (ie. 
>>> "Bysyncify") on the Doom 3 port. 
>>>
>>> I am using Chrome 75/Ubuntu 18.04/nVidia binary drivers, and used the 
>>> "timedemo demo1" command to measure the FPS (not available on the D3 demo 
>>> though, too bad. I had to do this with the full version of the game).
>>>
>>> The good news: UPSTREAM/ASYNCIFY is working well! And easier to use than 
>>> Emterpreter. However there is a catch on the final wasm size. Here are the 
>>> raw results:
>>>
>>> TARGET                      FPS    SIZE (MB)
>>> O2/FASTCOMP/EMTERPRETER     50       4,55 MB        (for reference. NB: 
>>> I am using whitelisting feature on EMTERPRETER)
>>> O2/UPSTREAM/ASYNCIFY        50       6,81 MB
>>> O2/UPSTREAM (no Asyncify)   50       3,90 MB
>>> O3/UPSTREAM/ASYNCIFY        51       6,96 MB
>>> Os/UPSTREAM/ASYNCIFY        41       5,56 MB
>>> Oz/UPSTREAM/ASYNCIFY        40       5,56 MB
>>>
>>> What to read from these numbers:
>>> - Performance
>>> -- FASTCOMP/EMTERPRETER and UPSTREAM/ASYNCIFY have a similar performance 
>>> profile: with O2 optimization, there 50 FPS on average for both targets.
>>> -- ASYNCIFY have no impact on performance: with O2 optimization, there 
>>> is 50 FPS on average with and without for both targets (NB: on the D3 port, 
>>> I really tried to 'yield' as few as possible)
>>> -- There is however an important gap between Os/Oz and O2/O3: using Os 
>>> lead to a 20% performance hit comparted to O2 (50 FPS with O2/O3 => 40 FPS 
>>> with Os/Oz)
>>> -- O3 compared to O2 does not bring significant performance improvement
>>> -- Same thing for Oz compared to Os: both are almost the same
>>>
>>> - Binary size
>>> -- UPSTREAM/ASYNCIFY do have a big impact on final binary size: this 
>>> roughly a +50% increase (from 4,55 MB with O2/FASTCOMP/EMTERPRETER => 6,81 
>>> MB with O2/UPSTREAM/ASYNCIFY)
>>> -- It is really the ASYNCIFY that cause this binary size increase, as 
>>> without ASYNCIFY, UPSTREAM produce a binary that is 15% smaller than 
>>> FASTCOMP (from 4,55 MB with FASTCOMP/EMTERPRETER => 3,90 MB with UPSTREAM)
>>> -- Using Os compared to O2 brings a binary size improvement (from 6,81 
>>> MB with O2 => 5,56 MB with Os), but this does not match with FASTCOMP (4,55 
>>> MB) 
>>> -- Oz compared to Os does not bring significant binary size improvement
>>>
>>> So, all in all, my observation is that ASYNCIFY works well, but the 
>>> binary size increase is not negligible (+50%). 
>>> Using Os/Oz instead of O2/O3 allow to reduce that overhead to some 
>>> extent, but it is at the expense of a 20% performance hit (at least on the 
>>> D3 port), and not on par with the FASTCOMP binary size.
>>>
>>> As it appears it is really the Asyncify transformation that brings the 
>>> binary size increase, the whitelisting feature could really bring the best 
>>> of both world:
>>> - By default (that is, without whitelisting):
>>>     - Ease of use of ASYNCIFY compared to EMTERPRETER (this works *by 
>>> default*, without having to do some extra work)
>>>     - No performance impact of using ASYNCIFY (at least, when using 
>>> yield/sleep carefully)
>>>     - Cons: +50% binary size
>>> - With whitelisting:
>>>     - The binary size issue could be mitigated a lot, as UPSTREAM give 
>>> smaller binary size than FASTCOMP (-15% on D3)
>>>     - Cons: obviously, some work to do with whitelisting, but this is 
>>> the same as with EMTERPRETER
>>>
>>> Here it is!
>>>
>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "emscripten-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/emscripten-discuss/c9d94058-7dc6-4c3f-9d56-59edbde20955%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/emscripten-discuss/c9d94058-7dc6-4c3f-9d56-59edbde20955%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/emscripten-discuss/7de30e59-da9c-4ad6-af7d-8e23699d5d6a%40googlegroups.com.

Re: Doom 3 port / Asyncify test results

Reply via email to