This is great Alon!

I wonder if it would be possible to not to have to parse JS at all? In
fastcomp JSBackend, we output the lowered AST to a string of JS code, just
to parse that generated JS code back to another AST form in order to
perform the JS optimizations on it. Would it be possible to convert the
LLVM AST directly to the desired target JS AST form without routing via an
intermediate string representation?

   Jukka


2014-11-17 6:24 GMT+02:00 Marc <[email protected]>:

> I started a JS parser today (named "parsen" :-) ).
>
> I have no github account yet so I put the code as an attachement
> ("parsen.main.cpp").
>
> The objective is to make a program which parse its own code (the JS
> output from emscripten).
>
> For the moment, I have a lexer. It split lines ("\n"), spaces (blanks),
> words ("alnum"), brackets (open/close, curly, square ...).
>
> You can look at the "bool parse(const char* path)" in the source file.
>
> I want to provide the JS AST as a flat indexed array (a parent/child
> table).
>
> Each cell should have:
>  -its type (a friendly C++11 "enum class",
>  "function","var","+" ...)
>  -the index of its parent (in the array)
>  -line/column number in the source file
>  -its name/content ou value (floats ...)
>  -...
>
> I can also decode the special instructions of asm.js ("|0" ...).
>
> The final idea is to be able to run queries over the AST (like LINQ).
>
> Do something like:
>  ast.select("function").select("var").dump();
>
> Can you tell me more about your needs? What kind of patterns are you
> looking for from the optimizer?
>
> PS: Sorry for the many mistakes. I'm french and not good at all in
> english!
>
> Le Sun, 16 Nov 2014 11:02:45 -0800,
> Alon Zakai <[email protected]> a écrit :
>
> > The goal is to parse the JS output of the fastcomp LLVM backend. Then
> > we run optimization passes on that AST.
> >
> > Thanks about TinyJS, looks interesting! Ok, at this point I am
> > considering 3 options:
> >
> > 1. Modify TinyJS parser (already in C++, which is good)
> > 2. Port Higgs parser from D (nicest written code of all the options)
> > 3. Port Acorn parser from JS
> >
> > I am leaning to the last, because it seems the most active and
> > maintained, and has support for parsing ES6 already (we don't need
> > that immediately, but eventually we might). Also it is the only one
> > that has focused on parsing speed, as far as I can tell.
> >
> > - Alon
> >
> >
> >
> > On Fri, Nov 14, 2014 at 7:44 PM, Marc <[email protected]> wrote:
> >
> > > This one is not bad:
> > >  https://code.google.com/p/tiny-js/source/browse/trunk/TinyJS.h
> > >
> > > There is only two files to include.
> > >
> > > The licence is ok (MIT like).
> > >
> > > Which part of the js files do you want to parse? Is it the generated
> > > "LLVM as JS" output or any of the libraries you've made (like
> > > "parseTools.js" or "analyzer.js").
> > >
> > > I've looked a bit at ANTLR but the grammar files for Javascript are
> > > a old.
> > >
> > > There is a more "exotic" alternative I can imagine. It is to use
> > > this Haskell parser:
> > >
> > > https://hackage.haskell.org/package/language-javascript
> > >
> > > The grammar file is really pretty:
> > >
> > >
> > >
> https://github.com/alanz/language-javascript/blob/master/src/Language/JavaScript/Parser/Grammar5.y
> > >
> > > I know that GHC generates a kind of C (some "C--") as an
> > > intermediate code. It is may be possible to wrap a function around
> > > it.
> > >
> > > It's a crazy idea :-)
> > >
> > >
> > >
> > > Le Fri, 14 Nov 2014 16:43:55 -0800,
> > > Alon Zakai <[email protected]> a écrit :
> > >
> > > > I wasn't familiar with that, thanks. Looks interesting, however
> > > > the GPL license is a problem as we do want the option to run the
> > > > parser on the client machine, linked to other code, and this
> > > > would limit the amount of people that would use it.
> > > >
> > > > - Alon
> > > >
> > > >
> > > > On Fri, Nov 14, 2014 at 3:04 AM, Marc <[email protected]> wrote:
> > > >
> > > > > Do you know this one?
> > > > >  https://github.com/cesanta/v7
> > > > >
> > > > > Le Thu, 13 Nov 2014 17:19:46 -0800,
> > > > > Alon Zakai <[email protected]> a écrit :
> > > > >
> > > > > > Early this year the fastcomp project replaced the core
> > > > > > compiler, which was written in JS, with an LLVM backend in
> > > > > > C++, and that brought large compilation speedups. However,
> > > > > > the late JS optimization passes were still run in JS, which
> > > > > > meant optimized builds could be slow (in unoptimized builds,
> > > > > > we don't run those JS optimizations, typically). Especially
> > > > > > in very large projects, this could be annoying.
> > > > > >
> > > > > > Progress towards speeding up those JS optimization passes just
> > > > > > landed, turned off, on incoming. This is not yet stable or
> > > > > > ready, so it is *not* enabled by default. Feel free to test
> > > > > > it though and report bugs. To use it, build with
> > > > > >
> > > > > > EMCC_NATIVE_OPTIMIZER=1
> > > > > >
> > > > > > in the environment, e.g.
> > > > > >
> > > > > > EMCC_NATIVE_OPTIMIZER=1 emcc -O2 tests/hello_world.c
> > > > > >
> > > > > > It just matters when building to JS (not building C++ to
> > > > > > object/bitcode). When EMCC_DEBUG=1 is used, you should see it
> > > > > > mention it uses the native optimizer. The first time you use
> > > > > > it, it will also say it is compiling it, which can take
> > > > > > several seconds.
> > > > > >
> > > > > > The native optimizer is basically a port of the JS optimizer
> > > > > > passes from JS into c++11. c++11 features like lambdas made
> > > > > > this much easier than it would have been otherwise, as the JS
> > > > > > code has lots of lambdas. The ported code uses the same
> > > > > > JSON-based AST, implemented in C++.
> > > > > >
> > > > > > Using c++11 is a little risky. We build the code natively,
> > > > > > using clang from fastcomp, but we do use the system C++
> > > > > > standard libraries. In principle if those are not
> > > > > > c++11-friendly, problems could happen. It seems to work fine
> > > > > > where I tested so far.
> > > > > >
> > > > > > Not all passes have been converted, but the main
> > > > > > time-consuming passes in -O2 have been (eliminator,
> > > > > > simplifyExpresions, registerize). (Note that in -O3 the
> > > > > > registerizeHarder pass has *not* yet been converted.) The
> > > > > > toolchain can handle running some passes in JS and some
> > > > > > passes natively, using JSON to serialize them.
> > > > > >
> > > > > > Potentially this approach can speed us up very significantly,
> > > > > > but it isn't quite there yet. JSON parsing/unparsing and
> > > > > > running the passes themselves can be done natively, and in
> > > > > > tests I see that running 4x faster, and using about half as
> > > > > > much memory. However, there is overhead from serializing JSON
> > > > > > between native and JS, which will remain until 100% of the
> > > > > > passes you use are native. Also, and more significantly, we
> > > > > > do not have a parser from JS - the output of fastcomp - to
> > > > > > the JSON AST. That means that we send fastcomp output into JS
> > > > > > to be parsed, it emits JSON, and we read that in the native
> > > > > > optimizer.
> > > > > >
> > > > > > For those reasons, the current speedup is not dramatic. I see
> > > > > > around a 10% improvement, far from how much we could reach.
> > > > > >
> > > > > > Further speedups will happen as the final passes are
> > > > > > converted. The bigger issue is to write a JS parser in C++
> > > > > > for this. This is not that easy as parsing JS is not that
> > > > > > easy - there are some corner cases and ambiguities. I'm
> > > > > > looking into existing code for this, but not sure there is
> > > > > > anything we can easily use - JS engine parsers are in C++ but
> > > > > > tend not to be easy to detach. If anyone has good ideas here
> > > > > > that would be useful.
> > > > > >
> > > > > > - Alon
> > > > > >
> > > > >
> > > > > --
> > > > > You received this message because you are subscribed to the
> > > > > Google Groups "emscripten-discuss" group.
> > > > > To unsubscribe from this group and stop receiving emails from
> > > > > it, send an email to
> > > > > [email protected]. For more
> > > > > options, visit https://groups.google.com/d/optout.
> > > > >
> > > >
> > >
> > > --
> > > You received this message because you are subscribed to the Google
> > > Groups "emscripten-discuss" group.
> > > To unsubscribe from this group and stop receiving emails from it,
> > > send an email to [email protected].
> > > For more options, visit https://groups.google.com/d/optout.
> > >
> >
>
> --
> You received this message because you are subscribed to the Google Groups
> "emscripten-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to