I see, makes sense! Pretty much the same problem as virtual methods / jump tables which disable dead-code-elimination I guess.
On Monday, 22 July 2019 18:45:22 UTC+2, Alon Zakai wrote: > > Floh: the analysis works well in many cases, but the main problem is > indirect calls - in a big enough codebase with enough of those, any > indirect call looks like it can lead to something that sleeps :( We may > indeed need a whitelist approach, some thinking is happening here: > https://github.com/WebAssembly/binaryen/issues/2218 > > On Mon, Jul 22, 2019 at 4:36 AM Floh <[email protected] <javascript:>> > wrote: > >> Many thanks for the detailed breakdown :) >> >> Is this only using asyncify for an 'infinite game loop' (instead of a >> frame callback), or is this also used for other synchronous calls like >> file- and network-I/O? I'm trying to understand the reason behind the 50% >> size increase. As far as I understand Alon's recent blog post on the topic, >> there's a control-flow analysis happening, so only functions along >> call-stacks which need 'asyncification' would need to be instrumented, but >> that might just be my overly optimistic interpretation ;) (so for instance, >> if only the main loop would need to be asyncified, the size increase should >> be very small, but if there are synchronous IO calls all over the place, >> much more code would need to be instrumented, adding to the code size). >> >> Cheers! >> -Floh. >> >> On Monday, 22 July 2019 08:40:13 UTC+2, Gabriel CV wrote: >>> >>> Hi! >>> >>> I did some tests with the new Upstream/Asyncify feature (ie. >>> "Bysyncify") on the Doom 3 port. >>> >>> I am using Chrome 75/Ubuntu 18.04/nVidia binary drivers, and used the >>> "timedemo demo1" command to measure the FPS (not available on the D3 demo >>> though, too bad. I had to do this with the full version of the game). >>> >>> The good news: UPSTREAM/ASYNCIFY is working well! And easier to use than >>> Emterpreter. However there is a catch on the final wasm size. Here are the >>> raw results: >>> >>> TARGET FPS SIZE (MB) >>> O2/FASTCOMP/EMTERPRETER 50 4,55 MB (for reference. NB: >>> I am using whitelisting feature on EMTERPRETER) >>> O2/UPSTREAM/ASYNCIFY 50 6,81 MB >>> O2/UPSTREAM (no Asyncify) 50 3,90 MB >>> O3/UPSTREAM/ASYNCIFY 51 6,96 MB >>> Os/UPSTREAM/ASYNCIFY 41 5,56 MB >>> Oz/UPSTREAM/ASYNCIFY 40 5,56 MB >>> >>> What to read from these numbers: >>> - Performance >>> -- FASTCOMP/EMTERPRETER and UPSTREAM/ASYNCIFY have a similar performance >>> profile: with O2 optimization, there 50 FPS on average for both targets. >>> -- ASYNCIFY have no impact on performance: with O2 optimization, there >>> is 50 FPS on average with and without for both targets (NB: on the D3 port, >>> I really tried to 'yield' as few as possible) >>> -- There is however an important gap between Os/Oz and O2/O3: using Os >>> lead to a 20% performance hit comparted to O2 (50 FPS with O2/O3 => 40 FPS >>> with Os/Oz) >>> -- O3 compared to O2 does not bring significant performance improvement >>> -- Same thing for Oz compared to Os: both are almost the same >>> >>> - Binary size >>> -- UPSTREAM/ASYNCIFY do have a big impact on final binary size: this >>> roughly a +50% increase (from 4,55 MB with O2/FASTCOMP/EMTERPRETER => 6,81 >>> MB with O2/UPSTREAM/ASYNCIFY) >>> -- It is really the ASYNCIFY that cause this binary size increase, as >>> without ASYNCIFY, UPSTREAM produce a binary that is 15% smaller than >>> FASTCOMP (from 4,55 MB with FASTCOMP/EMTERPRETER => 3,90 MB with UPSTREAM) >>> -- Using Os compared to O2 brings a binary size improvement (from 6,81 >>> MB with O2 => 5,56 MB with Os), but this does not match with FASTCOMP (4,55 >>> MB) >>> -- Oz compared to Os does not bring significant binary size improvement >>> >>> So, all in all, my observation is that ASYNCIFY works well, but the >>> binary size increase is not negligible (+50%). >>> Using Os/Oz instead of O2/O3 allow to reduce that overhead to some >>> extent, but it is at the expense of a 20% performance hit (at least on the >>> D3 port), and not on par with the FASTCOMP binary size. >>> >>> As it appears it is really the Asyncify transformation that brings the >>> binary size increase, the whitelisting feature could really bring the best >>> of both world: >>> - By default (that is, without whitelisting): >>> - Ease of use of ASYNCIFY compared to EMTERPRETER (this works *by >>> default*, without having to do some extra work) >>> - No performance impact of using ASYNCIFY (at least, when using >>> yield/sleep carefully) >>> - Cons: +50% binary size >>> - With whitelisting: >>> - The binary size issue could be mitigated a lot, as UPSTREAM give >>> smaller binary size than FASTCOMP (-15% on D3) >>> - Cons: obviously, some work to do with whitelisting, but this is >>> the same as with EMTERPRETER >>> >>> Here it is! >>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "emscripten-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/emscripten-discuss/c9d94058-7dc6-4c3f-9d56-59edbde20955%40googlegroups.com >> >> <https://groups.google.com/d/msgid/emscripten-discuss/c9d94058-7dc6-4c3f-9d56-59edbde20955%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "emscripten-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/emscripten-discuss/7de30e59-da9c-4ad6-af7d-8e23699d5d6a%40googlegroups.com.
