Very interesting, thanks!

In particular about Rust helping avoid leaking, that's something I hadn't
thought of, very nice. However, the reason the C++ optimizer leaks is,
aside from complexity as you mentioned, also for another reason: to avoid
memory allocation overhead, so that allocation is always bumping a pointer,
and freeing is just ignoring it. If you're not leaking, then I guess you
have lists of freed objects for reuse, or such? I wonder if it's possible
to measure that overhead. (For comparison, in the binaryen optimizer I've
focused on reusing nodes; there is still leaking when that isn't possible,
but it's relatively rare.)

Memory-wise, yeah, keeping the 1-1 mapping to the Uglify AST definitely
hurt the C++ optimizer's memory usage. The binaryen optimizer does
something similar to what you said, with properly strongly-typed nodes. So
e.g. -4096 would be a Const node, currently taking 20 bytes (it could be
16, though).

On Sun, Nov 6, 2016 at 11:25 AM, Aidan Hobson Sayers <[email protected]>
wrote:

> The memory improvements are basically down to the AST being more strongly
> typed in Ayzim - structurally it's the same (there's a simple 1-1 mapping
> between representations). For example, the uglifyjs AST representation of `
> -4096` is `["unary-prefix", "-", ["num", 4096]]. This was translated
> faithfully to the C++ optimizer, so the arrays are `Value
> <https://github.com/kripken/emscripten/blob/1.36.14/tools/optimizer/simple_ast.h#L80>`s,
> which are dynamically typed by being a tagged union. The memory cost of a
> single `Value` is 16 bytes (`double` is the largest type in the union,
> and you pay that again to be able to hold the tag in the struct and pad
> it)...and then there's another 3*ptrsize bytes to store the vector type
> somewhere if the `Value` is an array (typically to point to a child
> node...so just add this for every node since they're all children!). You
> then multiply the size of `Value` by the number of items in the array,
> which is two at minimum (ish). Overall, for an AST node you're paying 
> `3*ptrlen
> + 32 + (16*N_additional_array_items)`.
>
> However, you know that  `X` in `["num", X]` is a double, and that each
> AST node has a limited set of possible tags, so you can treats whole AST
> nodes as tagged unions, rather than the individual AST node fields. In
> Ayzim you end up paying `ptrlen + 32` for any single ast node (ish).
>
> On top of this, the Emscripten optimizer leaks memory
> <https://github.com/kripken/emscripten/blob/1.36.14/tools/optimizer/optimizer.cpp#L467>
>  when
> replacing nodes (and elsewhere), probably because it's actually pretty
> tricky to keep track of what you're replacing, whether you've got another
> pointer to it hanging around somewhere and whether it's safe to deallocate.
> Leveraging the Rust ownership system made it quite tricky to translate
> parts of the C++ code, but the result is no memory leaks.
>
> Speedups were probably mostly from a) better overall optimization of the
> AST node tagged enum, b) less compact memory, c) using a string interning
> library with some interning at compile time (thanks to the Servo project)
> so some value comparisons could be inlined rather than going via pointer
> lookup, and d) a carefully chosen piece of low hanging fruit in
> registerizeHarder.
>
> The language helped in that I felt it gave me niceties to help me succeed
> (e.g. 'first class' tagged unions, exhaustiveness checking on matching
> tagged unions), it has a good library experience and the whole memory
> safety thing is nice. Downsides were the compile times, difficulties of
> translating highly unsafe C++ code and a number of language papercuts
> (lexical lifetimes in particular). Someone with full understanding of the
> optimizer and C++ would have been very able to do all the
> macro-optimizations in C++ so Rust isn't more powerful, but as a fallible
> human it helped me a lot and I felt much less like I was juggling chainsaws
> than when I've made my previous changes to the C++ optimizer :) For
> example, I'd feel pretty optimistic about my ability to add the duplicate
> function eliminator to it.
>
> I should have made my testing process clearer - during development I was
> diffing output of optimizing the sqlite and unity asm.js full library
> files. The only remaining differences I'm aware of are a) better float
> representation from ayzim and b) the three test cases here
> <https://github.com/aidanhs/ayzim/tree/0.1.2/testcases> (all three very
> minor - one ayzim is better at, one emscripten is better at, one ayzim gets
> very slightly wrong with float representation). If you spy other
> differences, let me know.
>
> On 5 November 2016 at 19:38, Alon Zakai <[email protected]> wrote:
>
>> Very cool!
>>
>> For those interested to check this out, you can just replace the existing
>> optimizer executable, and if you want to go back, just deleting the
>> replacement will make emcc rebuild the original one.
>>
>> Regarding those improvements to speed and memory use, I'm curious where
>> they come from - what were the changes you made? Small things, or large
>> structural changes to the AST? For comparison, the binaryen optimizer also
>> has some major improvements to speed and memory compared to the emscripten
>> asm.js one, and that's mostly from the redesigned AST - I'm curious if we
>> ended up doing similar things to improve on the old optimizer. Also, do you
>> think your choice of language had an effect here?
>>
>> Have you verified this generates the same output as the asm.js one, btw?
>> You mention it passes the test suite, but I'm also curious if it's
>> literally generating the same code as well.
>>
>> On Fri, Nov 4, 2016 at 8:08 PM, Aidan Hobson Sayers <[email protected]>
>> wrote:
>>
>>> Using Ayzim as a drop-in replacement for the Emscripten asm.js native
>>> optimizer when compiling an asm.js project of moderate or large size on
>>> `-O2` or `-O3` should result in a ~50-75% reduction in memory usage and a
>>> ~25-50% speedup when running asm.js native optimizer passes (i.e. most of
>>> the "js opts" stage as seen in in EMCC_DEBUG output).
>>>
>>> To get it, download the compiled releases for Linux and Windows from the
>>> ayzim releases page <https://github.com/aidanhs/ayzim/releases>,
>>> extract them and replace (after backing up!) the existing optimizer(.exe)
>>> binary in `emsdk/emscripten/incoming_optimizer_64bit/` (if you're not
>>> on `incoming` but still feel brave, take a look at your emscripten config
>>> file, usually at `$HOME/.emscripten`, which should point you to the right
>>> place).
>>>
>>> --
>>>
>>> Some background: when I was trying to port a large application to asm.js
>>> about 6 months ago I had serious problems with the Emscripten asm.js
>>> optimizer - it would split the 750MB .js file into chunks and promptly
>>> consume all 8GB of my RAM by trying to optimize the chunks in parallel,
>>> swapping everything else out of memory and grinding the machine to a halt.
>>> I tackled this problem by taking a brief(!) diversion to rewrite the
>>> optimizer in Rust to be more memory efficient. Along the way I added a few
>>> speedups.
>>>
>>> Ayzim is probably an entry in the "well this might have been useful two
>>> years ago" section of software (since asm.js is 'shortly' going to be made
>>> redundant by wasm) but someone may find a use for it. For example, people
>>> wanting to understand the structure of the Emscripten optimizer ast may
>>> want to look at this code
>>> <https://github.com/aidanhs/ayzim/blob/0.1.2/src/cashew.rs#L141> and/or
>>> ask me since I'm very familiar with it now :)
>>>
>>> In time I may extend Ayzim to support wasm optimizations and move it to
>>> being more of a library, but that's for the future.
>>>
>>> Aidan
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "emscripten-discuss" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "emscripten-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "emscripten-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to