On Sun, Dec 21, 2014 at 10:22 PM, Soeren Balko <[email protected]> wrote:
> So far, my tryout implementation is based on a script that I run using > --js-transform. It uses regular expressions to find integer arrays and > replaces them with some base64 string and a function wrapper around them to > turn them into an int8 array. I like the ministr approach as it preserves > the (printable) byte sequences (thus benefitting readability of string > literals) and apparently speeds up parsing time. If only they had provided > their escaping code for non-printable characters. > Here is the code I wrote for my tests: https://github.com/chadaustin/Web-Benchmarks/blob/master/meminit/meminit.py Evan pointed out that my code is incorrect in the case of an octal escape followed by numeric digits, but I don't think he posted his code. > Also, I still need to figure where exactly the "allocate([....], ...)" > calls are generated and change the code in there. > > If only for the sake of speeding up the JS parser, I wonder if some basic > inline RLE compression could be done as well. It would most probably not > help with the gzipped file, but keep the uncompressed JS file smaller and > potentially up parsing time at the expense of a small runtime overhead to > expand the RLE-encoded byte sequences into a region on the heap. > Hm, I wonder if the improved JS parse time would be offset by the more complex decoding / startup JITting. Probably worth measuring. Either way, a straight up string literal would be a huge improvement over the status quo for people who can't or don't want to use a separate meminit binary file. Thanks for investigating this. :) > Soeren > > > On Monday, December 22, 2014 7:58:26 AM UTC+10, Chad Austin wrote: >> >> Hi Soeren, >> >> @evanw and I have done similar research in this issue: >> https://github.com/kripken/emscripten/issues/2188 >> >> If we represent the meminit block as a large string literal rather than >> an array of 8-bit numbers, it would reduce code size by about 50%, improve >> JavaScript parse time, AND make it more readable, as C string literals >> would be visible in the output. >> >> Fixing this has been on our wishlist for some time and if you want to >> take a crack at it, we would be thrilled! >> >> Let me know if there's anything we can do to help, >> Chad >> >> >> On Sat, Dec 20, 2014 at 11:48 PM, Soeren Balko <[email protected]> wrote: >> >>> I played around with the separate memory init file and was surprised to >>> see that it does, in fact, increase the total code size. In fact, the >>> numbers I got are: >>> >>> * JS with inline memory initialization: 23186642 bytes >>> * JS and separate memory init file: 15250276+8988744 = 24239020 bytes >>> >>> That's a bit surprising to me as I would expect the binary memory init >>> file to spend one byte per, well, byte in HEAP8. Also, the inline memory >>> initializer is a plain JS array, which is unecessarily large (each value >>> takes at least 1-3 bytes per byte plus 1 byte for the comma). If the >>> initial memory values were encoded as an UTF-8 string (and at runtime >>> retrieved using String.charCodeAt), there were 1-2 bytes per "entry" (=byte >>> on the heap), only (on average if memory init values are uniformly >>> distributed: 1.5 bytes). Of course, that would produce non-printable >>> characters in the generated JS file. Not sure if all JS interpreters would >>> like that. If no, base64 (or basE91 for less overhead - see >>> http://base91.sourceforge.net/), would still use up less space in the >>> JS file. >>> >>> If noone objects, I would work on implementing the latter. >>> >>> Soeren >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "emscripten-discuss" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> >> >> -- >> Chad Austin >> Technical Director, IMVU >> http://engineering.imvu.com <http://www.imvu.com/members/Chad/> >> http://chadaustin.me >> >> >> -- > You received this message because you are subscribed to the Google Groups > "emscripten-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- Chad Austin Technical Director, IMVU http://engineering.imvu.com <http://www.imvu.com/members/Chad/> http://chadaustin.me -- You received this message because you are subscribed to the Google Groups "emscripten-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
