Very interesting, thanks! In particular about Rust helping avoid leaking, that's something I hadn't thought of, very nice. However, the reason the C++ optimizer leaks is, aside from complexity as you mentioned, also for another reason: to avoid memory allocation overhead, so that allocation is always bumping a pointer, and freeing is just ignoring it. If you're not leaking, then I guess you have lists of freed objects for reuse, or such? I wonder if it's possible to measure that overhead. (For comparison, in the binaryen optimizer I've focused on reusing nodes; there is still leaking when that isn't possible, but it's relatively rare.)
Memory-wise, yeah, keeping the 1-1 mapping to the Uglify AST definitely hurt the C++ optimizer's memory usage. The binaryen optimizer does something similar to what you said, with properly strongly-typed nodes. So e.g. -4096 would be a Const node, currently taking 20 bytes (it could be 16, though). On Sun, Nov 6, 2016 at 11:25 AM, Aidan Hobson Sayers <[email protected]> wrote: > The memory improvements are basically down to the AST being more strongly > typed in Ayzim - structurally it's the same (there's a simple 1-1 mapping > between representations). For example, the uglifyjs AST representation of ` > -4096` is `["unary-prefix", "-", ["num", 4096]]. This was translated > faithfully to the C++ optimizer, so the arrays are `Value > <https://github.com/kripken/emscripten/blob/1.36.14/tools/optimizer/simple_ast.h#L80>`s, > which are dynamically typed by being a tagged union. The memory cost of a > single `Value` is 16 bytes (`double` is the largest type in the union, > and you pay that again to be able to hold the tag in the struct and pad > it)...and then there's another 3*ptrsize bytes to store the vector type > somewhere if the `Value` is an array (typically to point to a child > node...so just add this for every node since they're all children!). You > then multiply the size of `Value` by the number of items in the array, > which is two at minimum (ish). Overall, for an AST node you're paying > `3*ptrlen > + 32 + (16*N_additional_array_items)`. > > However, you know that `X` in `["num", X]` is a double, and that each > AST node has a limited set of possible tags, so you can treats whole AST > nodes as tagged unions, rather than the individual AST node fields. In > Ayzim you end up paying `ptrlen + 32` for any single ast node (ish). > > On top of this, the Emscripten optimizer leaks memory > <https://github.com/kripken/emscripten/blob/1.36.14/tools/optimizer/optimizer.cpp#L467> > when > replacing nodes (and elsewhere), probably because it's actually pretty > tricky to keep track of what you're replacing, whether you've got another > pointer to it hanging around somewhere and whether it's safe to deallocate. > Leveraging the Rust ownership system made it quite tricky to translate > parts of the C++ code, but the result is no memory leaks. > > Speedups were probably mostly from a) better overall optimization of the > AST node tagged enum, b) less compact memory, c) using a string interning > library with some interning at compile time (thanks to the Servo project) > so some value comparisons could be inlined rather than going via pointer > lookup, and d) a carefully chosen piece of low hanging fruit in > registerizeHarder. > > The language helped in that I felt it gave me niceties to help me succeed > (e.g. 'first class' tagged unions, exhaustiveness checking on matching > tagged unions), it has a good library experience and the whole memory > safety thing is nice. Downsides were the compile times, difficulties of > translating highly unsafe C++ code and a number of language papercuts > (lexical lifetimes in particular). Someone with full understanding of the > optimizer and C++ would have been very able to do all the > macro-optimizations in C++ so Rust isn't more powerful, but as a fallible > human it helped me a lot and I felt much less like I was juggling chainsaws > than when I've made my previous changes to the C++ optimizer :) For > example, I'd feel pretty optimistic about my ability to add the duplicate > function eliminator to it. > > I should have made my testing process clearer - during development I was > diffing output of optimizing the sqlite and unity asm.js full library > files. The only remaining differences I'm aware of are a) better float > representation from ayzim and b) the three test cases here > <https://github.com/aidanhs/ayzim/tree/0.1.2/testcases> (all three very > minor - one ayzim is better at, one emscripten is better at, one ayzim gets > very slightly wrong with float representation). If you spy other > differences, let me know. > > On 5 November 2016 at 19:38, Alon Zakai <[email protected]> wrote: > >> Very cool! >> >> For those interested to check this out, you can just replace the existing >> optimizer executable, and if you want to go back, just deleting the >> replacement will make emcc rebuild the original one. >> >> Regarding those improvements to speed and memory use, I'm curious where >> they come from - what were the changes you made? Small things, or large >> structural changes to the AST? For comparison, the binaryen optimizer also >> has some major improvements to speed and memory compared to the emscripten >> asm.js one, and that's mostly from the redesigned AST - I'm curious if we >> ended up doing similar things to improve on the old optimizer. Also, do you >> think your choice of language had an effect here? >> >> Have you verified this generates the same output as the asm.js one, btw? >> You mention it passes the test suite, but I'm also curious if it's >> literally generating the same code as well. >> >> On Fri, Nov 4, 2016 at 8:08 PM, Aidan Hobson Sayers <[email protected]> >> wrote: >> >>> Using Ayzim as a drop-in replacement for the Emscripten asm.js native >>> optimizer when compiling an asm.js project of moderate or large size on >>> `-O2` or `-O3` should result in a ~50-75% reduction in memory usage and a >>> ~25-50% speedup when running asm.js native optimizer passes (i.e. most of >>> the "js opts" stage as seen in in EMCC_DEBUG output). >>> >>> To get it, download the compiled releases for Linux and Windows from the >>> ayzim releases page <https://github.com/aidanhs/ayzim/releases>, >>> extract them and replace (after backing up!) the existing optimizer(.exe) >>> binary in `emsdk/emscripten/incoming_optimizer_64bit/` (if you're not >>> on `incoming` but still feel brave, take a look at your emscripten config >>> file, usually at `$HOME/.emscripten`, which should point you to the right >>> place). >>> >>> -- >>> >>> Some background: when I was trying to port a large application to asm.js >>> about 6 months ago I had serious problems with the Emscripten asm.js >>> optimizer - it would split the 750MB .js file into chunks and promptly >>> consume all 8GB of my RAM by trying to optimize the chunks in parallel, >>> swapping everything else out of memory and grinding the machine to a halt. >>> I tackled this problem by taking a brief(!) diversion to rewrite the >>> optimizer in Rust to be more memory efficient. Along the way I added a few >>> speedups. >>> >>> Ayzim is probably an entry in the "well this might have been useful two >>> years ago" section of software (since asm.js is 'shortly' going to be made >>> redundant by wasm) but someone may find a use for it. For example, people >>> wanting to understand the structure of the Emscripten optimizer ast may >>> want to look at this code >>> <https://github.com/aidanhs/ayzim/blob/0.1.2/src/cashew.rs#L141> and/or >>> ask me since I'm very familiar with it now :) >>> >>> In time I may extend Ayzim to support wasm optimizations and move it to >>> being more of a library, but that's for the future. >>> >>> Aidan >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "emscripten-discuss" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "emscripten-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to the Google Groups > "emscripten-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "emscripten-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
