Seems like these guys did a pretty thorough analysis already and ended up concluding that "ministr" seems to be the way to go. So far, I tried base64, which already gives me a massive improvement for the uncompressed Javascript (18 MB, down from 23 MB) and also a small improvement (~200 kB) for the gzip -9 files.
So far, my tryout implementation is based on a script that I run using --js-transform. It uses regular expressions to find integer arrays and replaces them with some base64 string and a function wrapper around them to turn them into an int8 array. I like the ministr approach as it preserves the (printable) byte sequences (thus benefitting readability of string literals) and apparently speeds up parsing time. If only they had provided their escaping code for non-printable characters. Also, I still need to figure where exactly the "allocate([....], ...)" calls are generated and change the code in there. If only for the sake of speeding up the JS parser, I wonder if some basic inline RLE compression could be done as well. It would most probably not help with the gzipped file, but keep the uncompressed JS file smaller and potentially up parsing time at the expense of a small runtime overhead to expand the RLE-encoded byte sequences into a region on the heap. Soeren On Monday, December 22, 2014 7:58:26 AM UTC+10, Chad Austin wrote: > > Hi Soeren, > > @evanw and I have done similar research in this issue: > https://github.com/kripken/emscripten/issues/2188 > > If we represent the meminit block as a large string literal rather than an > array of 8-bit numbers, it would reduce code size by about 50%, improve > JavaScript parse time, AND make it more readable, as C string literals > would be visible in the output. > > Fixing this has been on our wishlist for some time and if you want to take > a crack at it, we would be thrilled! > > Let me know if there's anything we can do to help, > Chad > > > On Sat, Dec 20, 2014 at 11:48 PM, Soeren Balko <[email protected] > <javascript:>> wrote: > >> I played around with the separate memory init file and was surprised to >> see that it does, in fact, increase the total code size. In fact, the >> numbers I got are: >> >> * JS with inline memory initialization: 23186642 bytes >> * JS and separate memory init file: 15250276+8988744 = 24239020 bytes >> >> That's a bit surprising to me as I would expect the binary memory init >> file to spend one byte per, well, byte in HEAP8. Also, the inline memory >> initializer is a plain JS array, which is unecessarily large (each value >> takes at least 1-3 bytes per byte plus 1 byte for the comma). If the >> initial memory values were encoded as an UTF-8 string (and at runtime >> retrieved using String.charCodeAt), there were 1-2 bytes per "entry" (=byte >> on the heap), only (on average if memory init values are uniformly >> distributed: 1.5 bytes). Of course, that would produce non-printable >> characters in the generated JS file. Not sure if all JS interpreters would >> like that. If no, base64 (or basE91 for less overhead - see >> http://base91.sourceforge.net/), would still use up less space in the JS >> file. >> >> If noone objects, I would work on implementing the latter. >> >> Soeren >> >> -- >> You received this message because you are subscribed to the Google Groups >> "emscripten-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > > > > -- > Chad Austin > Technical Director, IMVU > http://engineering.imvu.com <http://www.imvu.com/members/Chad/> > http://chadaustin.me > > > -- You received this message because you are subscribed to the Google Groups "emscripten-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
