Re: Separate memory init file enlarges overall code size

Chad Austin Sun, 21 Dec 2014 21:11:32 -0800

On Sun, Dec 21, 2014 at 10:22 PM, Soeren Balko <[email protected]> wrote:


> So far, my tryout implementation is based on a script that I run using
> --js-transform. It uses regular expressions to find integer arrays and
> replaces them with some base64 string and a function wrapper around them to
> turn them into an int8 array. I like the ministr approach as it preserves
> the (printable) byte sequences (thus benefitting readability of string
> literals) and apparently speeds up parsing time. If only they had provided
> their escaping code for non-printable characters.
>

Here is the code I wrote for my tests:
https://github.com/chadaustin/Web-Benchmarks/blob/master/meminit/meminit.py

Evan pointed out that my code is incorrect in the case of an octal escape
followed by numeric digits, but I don't think he posted his code.


> Also, I still need to figure where exactly the "allocate([....], ...)"
> calls are generated and change the code in there.
>
> If only for the sake of speeding up the JS parser, I wonder if some basic
> inline RLE compression could be done as well. It would most probably not
> help with the gzipped file, but keep the uncompressed JS file smaller and
> potentially up parsing time at the expense of a small runtime overhead to
> expand the RLE-encoded byte sequences into a region on the heap.
>

Hm, I wonder if the improved JS parse time would be offset by the more
complex decoding / startup JITting.  Probably worth measuring.

Either way, a straight up string literal would be a huge improvement over
the status quo for people who can't or don't want to use a separate meminit
binary file.

Thanks for investigating this.  :)


> Soeren
>
>
> On Monday, December 22, 2014 7:58:26 AM UTC+10, Chad Austin wrote:
>>
>> Hi Soeren,
>>
>> @evanw and I have done similar research in this issue:
>> https://github.com/kripken/emscripten/issues/2188
>>
>> If we represent the meminit block as a large string literal rather than
>> an array of 8-bit numbers, it would reduce code size by about 50%, improve
>> JavaScript parse time, AND make it more readable, as C string literals
>> would be visible in the output.
>>
>> Fixing this has been on our wishlist for some time and if you want to
>> take a crack at it, we would be thrilled!
>>
>> Let me know if there's anything we can do to help,
>> Chad
>>
>>
>> On Sat, Dec 20, 2014 at 11:48 PM, Soeren Balko <[email protected]> wrote:
>>
>>> I played around with the separate memory init file and was surprised to
>>> see that it does, in fact, increase the total code size. In fact, the
>>> numbers I got are:
>>>
>>> * JS with inline memory initialization: 23186642 bytes
>>> * JS and separate memory init file:  15250276+8988744 = 24239020 bytes
>>>
>>> That's a bit surprising to me as I would expect the binary memory init
>>> file to spend one byte per, well, byte in HEAP8. Also, the inline memory
>>> initializer is a plain JS array, which is unecessarily large (each value
>>> takes at least 1-3 bytes per byte plus 1 byte for the comma). If the
>>> initial memory values were encoded as an UTF-8 string (and at runtime
>>> retrieved using String.charCodeAt), there were 1-2 bytes per "entry" (=byte
>>> on the heap), only (on average if memory init values are uniformly
>>> distributed: 1.5 bytes). Of course, that would produce non-printable
>>> characters in the generated JS file. Not sure if all JS interpreters would
>>> like that. If no, base64 (or basE91 for less overhead - see
>>> http://base91.sourceforge.net/), would still use up less space in the
>>> JS file.
>>>
>>> If noone objects, I would work on implementing the latter.
>>>
>>> Soeren
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "emscripten-discuss" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> --
>> Chad Austin
>> Technical Director, IMVU
>> http://engineering.imvu.com <http://www.imvu.com/members/Chad/>
>> http://chadaustin.me
>>
>>
>>   --
> You received this message because you are subscribed to the Google Groups
> "emscripten-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Chad Austin
Technical Director, IMVU
http://engineering.imvu.com <http://www.imvu.com/members/Chad/>
http://chadaustin.me

-- 
You received this message because you are subscribed to the Google Groups 
"emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Separate memory init file *enlarges* overall code size

Reply via email to

Re: Separate memory init file enlarges overall code size