FYI - In order to work around the problem, I went ahead and wrote a python
script that mimic'ed Emscripten's JS optimizer python file to split up the
large ASM JS file (using uglify JS) into a number of smaller JS files - all
configured via a set size (I used 512K). The script would also modify the
Emscripten generated HTML to include all the smaller JS files instead of
the one large file. Its come in super handy to debug large JS files. If
there's enough interest to port back to Emscripten, I can make this code
available. If this sounds interesting, read on :).
The basic idea was to chunk up the ASM JS at function level granularity
where we would move on to another file if we went past a point (say 512
Kb). While this sounded good in theory, in practice, it turned out to be a
bit more complex:
a. ASM JS implements a module pattern that uses the Javascript
closure concept to isolate private state from surrounding code. It does
this by using a function that implements all the functionality and returns
a dictionary of function “exports”. The key question was: how could we
break up the ASM JS into multiple files if it was all implemented inside a
single function (that contained thousands of variables and functions that
implemented the ASM JS functionality)?
b. Lexical scoping issues – the generated JS code makes liberal use of
global variables that were defined later on in the file. The JS interpreter
will first scope these and have them set to undefined. If we were to break
up the ASM JS into multiple files, then code that referenced global
variables defined later would have issues. We had a similar problem for
functions – although this was mainly just in the preamble code that
referenced a few functions defined after the ASM JS code.
To better illustrate the problems, here is some sample code that
demonstrates the structure of the generated ASM JS with an example testFunc
function that shows the structural changes we were talking about.
// PRE-ASM JS CODE
Module['testFunc'] = testFuncVar;
// EMSCRIPTEN_START_ASM
var asm = (function(global, env, buffer) {
'use asm';
// ASM JS global variables
var asmJSGlobal = 0;
// ASM JS functions
// e.g. testFunc
function testFunc() {
var a = asmJSGlobal;
return a;
}
return {testFunc: testFunc, <OTHER_ASM_JS_FUNCS>};})
// EMSCRIPTEN_END_ASM
(Module.asmGlobalArg, Module.asmLibraryArg, buffer);
// POST-ASM JS CODE
var testFuncVar = asm['testFunc'];
Note the line containing the code (Module.asmGlobalArg, Module.asmLibraryArg
, buffer);
It is passing those arguments to the anonymous function defined above it,
and assigning the return value of that function to the asm variable. In
this case, asm ends up containing a testFunc member that outside code can
use to invoke ASM JS functionality. Also note that the variable testFunc
that is assigned to asm[‘testFunc’] later on. Also note that
Module[‘testFunc’] gets assigned to testFunc up in the initial preamble
code. Even though the testFuncVar variable is not initialized at that
point, because of scoping, it is defined to be in scope, but with a value
of undefined.
In order to solve this problem, I ended up implementing an augmented module
<http://www.adequatelygood.com/JavaScript-Module-Pattern-In-Depth.html> that
used a global variable called asmJSPrivateState that all the functions in
the various split up ASM JS files would write to. So the example above
would translate into something along the lines of:
// PRE-ASM JS CODE
Module['testFunc'] = testFuncVar;
// EMSCRIPTEN_START_ASM
var asmJSPrivateState = {};
// ASM JS global variables
asmJSPrivateState.asmJSGlobal = 0;
// ASM JS functions
// e.g. testFunc
asmJSPrivateState.testFunc = function () {
var a = asmJSPrivateState.asmJSGlobal;
return a;
}
var asm = (function(global, env, buffer)) {
return {testFunc: asmJSPrivateState.testFunc, <OTHER_ASM_JS_FUNCTIONS>};
})
// EMSCRIPTEN_END_ASM
(Module.asmGlobalArg, Module.asmLibraryArg, buffer);
// POST-ASM JS CODE
var testFuncVar = asm['testFunc'];
On Monday, December 28, 2015 at 12:43:08 PM UTC-8, arnab choudhury wrote:
>
> Now, it becomes easy to split up the ASM JS functions into multiple files
> since each file is just updating the asmJSPrivateState variable. We can
> then modify the Emscripten HTML to include multiple script files instead of
> the original file and voila, we are done.
>
>
>
> The astute reader may have noted that we still haven’t solved the lexical
> scoping problem addressed in point (b) above. If we split up the ASM JS
> into multiple files, then testFuncVar would not even be in scope when we
> try to assign Module[‘testFunc’] to it in the preamble code. To get around
> this problem, we forward declare all the global variables and set them to
> undefined. They get assigned to the right value when they are defined later
> on. With that in mind, the final ASM JS (along with file boundaries) would
> look as follows:
>
>
>
> // PRE-ASM JS CODE
>
>
>
> // FORWARD DECLARE ALL GLOBAL variables to deal with scoping issues
>
> var testFuncVar = undefined;
>
>
>
> Module['testFunc'] = testFuncVar;
>
>
>
>
> /////////////////////////////////////////////////////////////////////////////////
>
> // FILE BOUNDARY
>
>
> /////////////////////////////////////////////////////////////////////////////////
>
>
>
>
>
> // EMSCRIPTEN_START_ASM
>
> var asmJSPrivateState = {};
>
>
>
> // ASM JS global variables
>
> asmJSPrivateState.asmJSGlobal = 0;
>
>
>
> // ASM JS functions
>
>
>
>
> /////////////////////////////////////////////////////////////////////////////////
>
> // MULTIPLE FILE BOUNDARIES
>
>
> /////////////////////////////////////////////////////////////////////////////////
>
>
>
> // e.g. testFunc
>
> asmJSPrivateState.testFunc = function () {
>
> var a = asmJSPrivateState.asmJSGlobal;
>
> return a;
>
> }
>
>
>
>
> /////////////////////////////////////////////////////////////////////////////////
>
> // FILE BOUNDARY
>
>
> /////////////////////////////////////////////////////////////////////////////////
>
>
>
> var asm = (function(global, env, buffer)) {
>
> return {testFunc: asmJSPrivateState.testFunc, <OTHER_ASM_JS_FUNCTIONS
> >};
>
> })
>
> // EMSCRIPTEN_END_ASM
>
>
>
> (Module.asmGlobalArg, Module.asmLibraryArg, buffer);
>
>
>
> // POST-ASM JS CODE
>
> var testFuncVar = asm['testFunc'];
>
>
>
>
>
>
>
> With such a change, we can now debug through the same ASM JS code, but
> with it being split up into multiple files. You no longer have to rely on
> the debugger to be able to load up files that can be greater than 50 Mb in
> size. Note that the final JS while functional, may not actually be ASM JS
> per spec, but for debugging purposes, unless you really care about that
> (that is, if the bug you are trying to fix actually relies on this), you
> should be just fine.
>
>
>
> On Monday, December 14, 2015 at 8:55:50 AM UTC-8, jj wrote:
>>
>> We have two items in the Firefox issue tracker regarding debugging large
>> files:
>>
>> https://bugzilla.mozilla.org/show_bug.cgi?id=1158098 - Improve debugger
>> to navigate and search large UE4 StrategyGame JavaScript file.
>> https://bugzilla.mozilla.org/show_bug.cgi?id=1224726 - High memory
>> consumption when opening and searching a large Javascript file in debugger.
>>
>> These are in the radar for 2016, so we are hoping to improve things
>> greatly here.
>>
>> 2015-12-04 23:16 GMT+02:00 Alon Zakai <[email protected]>:
>>
>>> Yes, this is a known issue. There is progress on the browser side to
>>> more efficiently handle such large programs, but no browser does this well
>>> yet.
>>>
>>> Emscripten supports dynamic linking,
>>>
>>> https://github.com/kripken/emscripten/wiki/Linking
>>>
>>> This can be a solution for this problem, by splitting things up into
>>> smaller files.
>>>
>>> In practice, personally, I tend to use print debugging and I open the
>>> file in a text editor on the side that can handle massive text files.
>>>
>>> On Fri, Dec 4, 2015 at 12:02 PM, arnab choudhury <[email protected]>
>>> wrote:
>>>
>>>> Hey all
>>>>
>>>> I'm using Emscripten to convert a decent sized C++ codebase to
>>>> Javascript. As part of this process, I'm finding that debugging the
>>>> unoptimized generated JS can be quite painful. Specifically, some JS files
>>>> can be up to 1 million lines long and this completely breaks my browser's
>>>> node debugger (via node-inspector). Google's v8 debugger (via node debug)
>>>> also has a hard time stepping through code. My only debugging technique
>>>> that works currently is to enter print statements, and debug and build
>>>> iteratively.
>>>>
>>>> I saw some threads on github about emscripten having the ability to
>>>> split up the generated JS files into multiple files. However, the latest
>>>> version of emscripten doesn't seem to support this. Have others run into
>>>> this issue? Are there any existing solutions for this problem?
>>>>
>>>> Thanks,
>>>> Arnab
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "emscripten-discuss" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "emscripten-discuss" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
--
You received this message because you are subscribed to the Google Groups
"emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.