Floh: the analysis works well in many cases, but the main problem is
indirect calls - in a big enough codebase with enough of those, any
indirect call looks like it can lead to something that sleeps :( We may
indeed need a whitelist approach, some thinking is happening here:
https://github.com/WebAssembly/binaryen/issues/2218

On Mon, Jul 22, 2019 at 4:36 AM Floh <[email protected]> wrote:

> Many thanks for the detailed breakdown :)
>
> Is this only using asyncify for an 'infinite game loop' (instead of a
> frame callback), or is this also used for other synchronous calls like
> file- and network-I/O? I'm trying to understand the reason behind the 50%
> size increase. As far as I understand Alon's recent blog post on the topic,
> there's a control-flow analysis happening, so only functions along
> call-stacks which need 'asyncification' would need to be instrumented, but
> that might just be my overly optimistic interpretation ;) (so for instance,
> if only the main loop would need to be asyncified, the size increase should
> be very small, but if there are synchronous IO calls all over the place,
> much more code would need to be instrumented, adding to the code size).
>
> Cheers!
> -Floh.
>
> On Monday, 22 July 2019 08:40:13 UTC+2, Gabriel CV wrote:
>>
>> Hi!
>>
>> I did some tests with the new Upstream/Asyncify feature (ie. "Bysyncify")
>> on the Doom 3 port.
>>
>> I am using Chrome 75/Ubuntu 18.04/nVidia binary drivers, and used the
>> "timedemo demo1" command to measure the FPS (not available on the D3 demo
>> though, too bad. I had to do this with the full version of the game).
>>
>> The good news: UPSTREAM/ASYNCIFY is working well! And easier to use than
>> Emterpreter. However there is a catch on the final wasm size. Here are the
>> raw results:
>>
>> TARGET                      FPS    SIZE (MB)
>> O2/FASTCOMP/EMTERPRETER     50       4,55 MB        (for reference. NB: I
>> am using whitelisting feature on EMTERPRETER)
>> O2/UPSTREAM/ASYNCIFY        50       6,81 MB
>> O2/UPSTREAM (no Asyncify)   50       3,90 MB
>> O3/UPSTREAM/ASYNCIFY        51       6,96 MB
>> Os/UPSTREAM/ASYNCIFY        41       5,56 MB
>> Oz/UPSTREAM/ASYNCIFY        40       5,56 MB
>>
>> What to read from these numbers:
>> - Performance
>> -- FASTCOMP/EMTERPRETER and UPSTREAM/ASYNCIFY have a similar performance
>> profile: with O2 optimization, there 50 FPS on average for both targets.
>> -- ASYNCIFY have no impact on performance: with O2 optimization, there is
>> 50 FPS on average with and without for both targets (NB: on the D3 port, I
>> really tried to 'yield' as few as possible)
>> -- There is however an important gap between Os/Oz and O2/O3: using Os
>> lead to a 20% performance hit comparted to O2 (50 FPS with O2/O3 => 40 FPS
>> with Os/Oz)
>> -- O3 compared to O2 does not bring significant performance improvement
>> -- Same thing for Oz compared to Os: both are almost the same
>>
>> - Binary size
>> -- UPSTREAM/ASYNCIFY do have a big impact on final binary size: this
>> roughly a +50% increase (from 4,55 MB with O2/FASTCOMP/EMTERPRETER => 6,81
>> MB with O2/UPSTREAM/ASYNCIFY)
>> -- It is really the ASYNCIFY that cause this binary size increase, as
>> without ASYNCIFY, UPSTREAM produce a binary that is 15% smaller than
>> FASTCOMP (from 4,55 MB with FASTCOMP/EMTERPRETER => 3,90 MB with UPSTREAM)
>> -- Using Os compared to O2 brings a binary size improvement (from 6,81 MB
>> with O2 => 5,56 MB with Os), but this does not match with FASTCOMP (4,55
>> MB)
>> -- Oz compared to Os does not bring significant binary size improvement
>>
>> So, all in all, my observation is that ASYNCIFY works well, but the
>> binary size increase is not negligible (+50%).
>> Using Os/Oz instead of O2/O3 allow to reduce that overhead to some
>> extent, but it is at the expense of a 20% performance hit (at least on the
>> D3 port), and not on par with the FASTCOMP binary size.
>>
>> As it appears it is really the Asyncify transformation that brings the
>> binary size increase, the whitelisting feature could really bring the best
>> of both world:
>> - By default (that is, without whitelisting):
>>     - Ease of use of ASYNCIFY compared to EMTERPRETER (this works *by
>> default*, without having to do some extra work)
>>     - No performance impact of using ASYNCIFY (at least, when using
>> yield/sleep carefully)
>>     - Cons: +50% binary size
>> - With whitelisting:
>>     - The binary size issue could be mitigated a lot, as UPSTREAM give
>> smaller binary size than FASTCOMP (-15% on D3)
>>     - Cons: obviously, some work to do with whitelisting, but this is the
>> same as with EMTERPRETER
>>
>> Here it is!
>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "emscripten-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/emscripten-discuss/c9d94058-7dc6-4c3f-9d56-59edbde20955%40googlegroups.com
> <https://groups.google.com/d/msgid/emscripten-discuss/c9d94058-7dc6-4c3f-9d56-59edbde20955%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/emscripten-discuss/CAEX4NpQRXCL6-L99U%2Bw%3DBCYg4kRkYt3m-HR8df77YiE2zTCO4A%40mail.gmail.com.

Reply via email to