[nodejs] With this utility, everyone can easily invent new languages

Simeon Chaos Fri, 10 Jan 2014 17:38:11 -0800

How is peasy (https://github.com/chaosim/peasy) simpler and more powerful 
than other parser tools?


Simpler and more powerful? Maybe it is a bit contrary to common sense. True 
or false? To affirm, please give it a glimpse at first. Because of being 
simple, you can comprehend it in a little while; because of being powerful, 
it will deserve your every minute.

All along, the design of programming languages is a complex task. Because 
we need to understand the esoteric compiler theory and technology, and one 
of the most critical and very difficult part is to define the rules of the 
new language and to parse with them.To solve this problem, there have been 
many theories , techniques and tools . These tools can be roughly divided 
into two categories: one is parser generator, another is to write the 
parser by hand with or without a parser library.

The parser generator generally based on some type of formal languages , 
by which the generator make the lexer and parser from some rules. Strong 
aspect of these tools is that they can produce the most efficient parser 
support for their own form of language types , according to the 
characteristics of formal language , generally ensures linear time 
complexity. Commonly divided into three types : LR technology , LL 
technology , PEG technology. LR technology language has shift/reduction , 
from the bottom up , the most right derived and other technical features , 
they can only support LR languages , more accurately, mainly LALR 
variants , plus some special treatment for the priority of operator. One of 
the most famous was undoubtedly the lex / yacc and its numerous subsequent 
variants ( such as bison, jison (javascript language ), ply (python lex / 
yacc) , etc. ) . LL technology has a top-down form of the language , 
recursive descent , the most left-dereived and other technical features , 
although the concept is easier to understand than the LR language , but 
because LL covers less languages, so not used as universal as lex / yacc. 
Representative products are antlr. PEG technology refers to parsing 
expression grammar parser generator. peg is a kind of formal grammar 
particularly suited to be parsed, currently there are already many tools 
appear. The most common method is packrat  algorithm based implementations. 
Such as rats, ometa (there are many versions , such as ometa / squeak, 
ometa-js, pyMeta etc. ), pyPEG under python, peg.js  under javascript, 
treetop and citrus under ruby, and so on.

No matter what type of parser generator , the method to use them is to 
design rules for these tools , and then the tools generate parser. This is 
the difficult aspects. First, the object program is generated based on the 
state transfer and stack process, it is difficult to understand, debug and 
modify. Second, we must understand parser theory and technology of these 
tools is based on, the rules used by these tools is actually a 
domain-specific language , their expressivenes is very limited, while we 
must understand and become familiar with the new language in order to 
design rules. Third, we must embed certain processing ( abstract syntax 
tree structure , semantics, error reporting and handling , etc) in the form 
of object parser language  into lex/parser rules. Difficulties in these 
areas prevent the most programmers to easily use this type of tool.
    
Meanwhile, people have also been pursuing a more relaxed and flexible 
approach to write parsers. Most of these methods produce tools can be 
classified as combinational parser library , or peg grammar-based peg 
library . Use of such libraries, programmers can use their own language in 
daily use to design a new universal language , parsing text, so they are 
used more freguently. Representative works is parsec under haskell language 
 which maybe is the most mature and powerful. However, because haskell is 
too academic, and parsec is not universally popular. c + + language has 
boost phoenix library, probably because it depends on the c++ template, an 
advanced language features, it has not been widely used too. Python has 
pyparsing, which is used by many users. For specific questions , there are 
many applications do not use any tools or libraries, but manually write the 
entire parser , for example: cpython implementation, esprima for 
javascript. However, unfortunately,  in their grammar and parsing of these 
tools, there are two obvious difficulties not been solved: the first is 
left recursive grammar, because left-recursive function will lead to an 
unconditional infinite recursion. The second is the parsing efficiency. In 
order to obtain a linear time complexity , we should only parse same syntax 
component only once at any position. To achieve the latter, the result of 
parsing is required. Meanwhile , the parser also need to be considered in 
conjunction to produce abstract syntax trees , semantic processing , error 
handling , etc., and these libraries maybe try to keep away from some of 
the problems (in the combined libraries left recursion are generally 
fobidden), or in order to solve these problems the program becomes very 
complex , difficult to understand , use, modify and extend.

Can be seen from the above description , parsing is a very common 
programming needs , but also has a very high technical difficulty , 
resulting in a large number of theories, technique, and tools you have seen 
above, in this regard a variety of technical papers, code and tools can be 
used simply to describe the voluminous , but still did not achieve the 
desired results .


with peasy, a complete and elegant solution to all these problems emerge 
for the first time. On one hand you can say peasy the most simple and 
elegant , but also has the most powerful and adaptability , and does not 
need to sacrifice speed and efficiency.
 Its characteristics are as follows:

* for the first time, peasy provides a simple and complete solution to the 
problem of left recursion in the hand-written recursive descent parser 
manner, no matter direct or indirect left-recursive rule . To handle left 
recursive in peasy, only one function is needed (only 20 lines of code in 
coffeescript).
* peasy allows programmers to write the parser with common programming 
language, and the parser and the input to the grammar rules are similar to 
the parser generator , both concise and readable , easy to modify and 
extend.
* the simple and flexible caching mechanism in peasy can improve the time 
efficiency of parsing, we can maintain a linear time for liner time 
complexity grammar by caching. For complex grammar and data , we can also 
improve time efficiency by caching mechanism, and avoide the exponential, 
cube or square time complexity like some other common parsing algorithm. 
peasy's cache implementation is very simple, only one function (only ten 
lines of code under coffeescript  ) .
* the entire peasy implementation is very simple. only two hundred lines of 
code under cofeescript. The concept is simple. Only two kind of components: 
some match functions which operate on the parsed data and parsing pointer 
and some combinational functions which generate match function. all of 
these functions are very simple , usually only a few lines of code , And 
they exists in the code of peasy for demonstration objective, can be 
deleted, modified and replaced, according to the specific needs.

In fact, instead of to be considered as a library or a tool, maybe it's 
better to view peasy as a method, a way or some example to write parsers 
mannually. This will allow for a greater degree of power.

peasy provide specific examples to support the above conclusion. The most 
obvious example is the common programming languages in order to adapt and 
modify their parser tool to prevent left- recursive grammar . For example 
python grammar (http://docs.python.org/2/reference/grammar.html), 
javascript grammar 
(http://www-archive.mozilla.org/js/language/grammar14.html), while by using 
peasy, all binary arithmetic expression can be condensed into a simple left 
-recursive rules , left recursion elimination is not needed any more. For 
expression parsing, in order to further improve time efficiency , peasy 
also provides an example , has many different priorities for computing the 
call stack example can skip directly to the highest priority constant 
direct expression resolves to the final rather than, as is currently common 
practice, the rules of writing as follows : or -> (or | | and) | and; ...; 
add -> add + mul | mul; mul -> mul * atom | atom. Rare is , peasy while 
achieving enhanced capabilities of these expressions can also ensure that 
the linear time complexity of the parsing process. Want to know how peasy 
achieve these results? Please give a glimpse to peasy on 
github(https://github.com/chaosim/peasy).

the first two of following links is the messages I posted when I started to 
get the inspiration to start writing peasy. Because there are some of other 
tasks to finish, intermediate interrupted for some time. Not long ago, I 
use peasy to parse something in another projects , and I further optimize 
the api of peasy, and which is now very simple and elegant, and I am 
satisfied :) As for the most critical left recursive , and caching 
algorithms , withstood the subsequent examples of the test , there is no 
any change. Together, these examples also continues to demonstrate the 
advantages of peasy .

Finally, I need to mention the impact of language on thought: Without 
coffeescript language, maybe it's difficult for me to have these ideas. in 
coffeescript,  the -> function notation and "assignment is an expression" 
making special concise readable grammar. In python similar results can be 
achieved, need to go through some twists and turns, and that is not so 
natural as in coffeescript.There should exists no big obstacle in other 
dynamic languages, but they is not as elegant as coffeescript tool. This 
point is obvious with the difference between the javascript code of peasy 
and its coffeescript source.

messages I posted in google groups before:
https://groups.google.com/forum/#!search/left$20recursive/comp.lang.javascript/GN7b9Tr6j98/DoiHzi9i77oJ
https://groups.google.com/forum/#!search/%E5%B7%A6%E9%80%92%E5%BD%92/python-cn/Uqr-Xf3I3LM/oUd3Sj4HvxYJ


lr grammar and lex/yacc, bison, ply
http://en.wikipedia.org/wiki/LR_parser
http://dinosaur.compilertools.net/
http://en.wikipedia.org/wiki/Yacc
http://www.dabeaz.com/ply/
http://www.gnu.org/software/bison/
http://zaach.github.io/jison/

LL grammar
http://en.wikipedia.org/wiki/LL_parser
http://www.antlr.org/


some one asked for parser to handle left recursion on SO:
http://stackoverflow.com/questions/4397039/any-peg-parser-capable-to-handle-left-recursion


peg and packrat
http://bford.info/packrat/
http://en.wikipedia.org/wiki/Parsing_expression_grammar
http://cs.nyu.edu/rgrimm/xtc/rats-intro.html
http://java-source.net/open-source/parser-generators/rats!

phoenix 
http://www.boost.org/doc/libs/1_55_0/libs/spirit/phoenix/doc/html/index.html
http://www.ontolinux.com/community/phoenix/Phoenix_Manual.pdf

ometa：
Alessandro Warth describe a method to handle left recursion in peg grammar 
in his PhD paper firstly.
After I have my idea about Peasy and algorith to handle left recursion, I 
searched and found his paper, and found my algorithm is actully similar to 
his method.
http://tinlizzie.org/ometa/
Memoization in Top-Down Parsing
http://www.tinlizzie.org/~awarth/johnson.html

https://github.com/alexwarth/ometa-js

parsing under python
https://wiki.python.org/moin/LanguageParsing

parser for javascript or in javascript
http://esprima.org/
http://zaach.github.io/jison/

-- 
-- 
Job Board: http://jobs.nodejs.org/
Posting guidelines: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

[nodejs] With this utility, everyone can easily invent new languages

Reply via email to