Peasy looks interesting and it's on my list to look at, but this post was too long to read.
On Fri, Jan 10, 2014 at 5:37 PM, Simeon Chaos <[email protected]>wrote: > How is peasy (https://github.com/chaosim/peasy) simpler and more powerful > than other parser tools? > > Simpler and more powerful? Maybe it is a bit contrary to common sense. > True or false? To affirm, please give it a glimpse at first. Because of > being simple, you can comprehend it in a little while; because of being > powerful, it will deserve your every minute. > > All along, the design of programming languages is a complex task. > Because we need to understand the esoteric compiler theory and technology, > and one of the most critical and very difficult part is to define the rules > of the new language and to parse with them.To solve this problem, there > have been many theories , techniques and tools . These tools can be roughly > divided into two categories: one is parser generator, another is to write > the parser by hand with or without a parser library. > > The parser generator generally based on some type of formal languages , > by which the generator make the lexer and parser from some rules. Strong > aspect of these tools is that they can produce the most efficient parser > support for their own form of language types , according to the > characteristics of formal language , generally ensures linear time > complexity. Commonly divided into three types : LR technology , LL > technology , PEG technology. LR technology language has shift/reduction , > from the bottom up , the most right derived and other technical features , > they can only support LR languages , more accurately, mainly LALR > variants , plus some special treatment for the priority of operator. One of > the most famous was undoubtedly the lex / yacc and its numerous subsequent > variants ( such as bison, jison (javascript language ), ply (python lex / > yacc) , etc. ) . LL technology has a top-down form of the language , > recursive descent , the most left-dereived and other technical features , > although the concept is easier to understand than the LR language , but > because LL covers less languages, so not used as universal as lex / yacc. > Representative products are antlr. PEG technology refers to parsing > expression grammar parser generator. peg is a kind of formal grammar > particularly suited to be parsed, currently there are already many tools > appear. The most common method is packrat algorithm based implementations. > Such as rats, ometa (there are many versions , such as ometa / squeak, > ometa-js, pyMeta etc. ), pyPEG under python, peg.js under javascript, > treetop and citrus under ruby, and so on. > > No matter what type of parser generator , the method to use them is to > design rules for these tools , and then the tools generate parser. This is > the difficult aspects. First, the object program is generated based on the > state transfer and stack process, it is difficult to understand, debug and > modify. Second, we must understand parser theory and technology of these > tools is based on, the rules used by these tools is actually a > domain-specific language , their expressivenes is very limited, while we > must understand and become familiar with the new language in order to > design rules. Third, we must embed certain processing ( abstract syntax > tree structure , semantics, error reporting and handling , etc) in the form > of object parser language into lex/parser rules. Difficulties in these > areas prevent the most programmers to easily use this type of tool. > > Meanwhile, people have also been pursuing a more relaxed and flexible > approach to write parsers. Most of these methods produce tools can be > classified as combinational parser library , or peg grammar-based peg > library . Use of such libraries, programmers can use their own language in > daily use to design a new universal language , parsing text, so they are > used more freguently. Representative works is parsec under haskell language > which maybe is the most mature and powerful. However, because haskell is > too academic, and parsec is not universally popular. c + + language has > boost phoenix library, probably because it depends on the c++ template, an > advanced language features, it has not been widely used too. Python has > pyparsing, which is used by many users. For specific questions , there are > many applications do not use any tools or libraries, but manually write the > entire parser , for example: cpython implementation, esprima for > javascript. However, unfortunately, in their grammar and parsing of these > tools, there are two obvious difficulties not been solved: the first is > left recursive grammar, because left-recursive function will lead to an > unconditional infinite recursion. The second is the parsing efficiency. In > order to obtain a linear time complexity , we should only parse same syntax > component only once at any position. To achieve the latter, the result of > parsing is required. Meanwhile , the parser also need to be considered in > conjunction to produce abstract syntax trees , semantic processing , error > handling , etc., and these libraries maybe try to keep away from some of > the problems (in the combined libraries left recursion are generally > fobidden), or in order to solve these problems the program becomes very > complex , difficult to understand , use, modify and extend. > > Can be seen from the above description , parsing is a very common > programming needs , but also has a very high technical difficulty , > resulting in a large number of theories, technique, and tools you have seen > above, in this regard a variety of technical papers, code and tools can be > used simply to describe the voluminous , but still did not achieve the > desired results . > > > with peasy, a complete and elegant solution to all these problems emerge > for the first time. On one hand you can say peasy the most simple and > elegant , but also has the most powerful and adaptability , and does not > need to sacrifice speed and efficiency. > Its characteristics are as follows: > > * for the first time, peasy provides a simple and complete solution to the > problem of left recursion in the hand-written recursive descent parser > manner, no matter direct or indirect left-recursive rule . To handle left > recursive in peasy, only one function is needed (only 20 lines of code in > coffeescript). > * peasy allows programmers to write the parser with common programming > language, and the parser and the input to the grammar rules are similar to > the parser generator , both concise and readable , easy to modify and > extend. > * the simple and flexible caching mechanism in peasy can improve the time > efficiency of parsing, we can maintain a linear time for liner time > complexity grammar by caching. For complex grammar and data , we can also > improve time efficiency by caching mechanism, and avoide the exponential, > cube or square time complexity like some other common parsing algorithm. > peasy's cache implementation is very simple, only one function (only ten > lines of code under coffeescript ) . > * the entire peasy implementation is very simple. only two hundred lines > of code under cofeescript. The concept is simple. Only two kind of > components: some match functions which operate on the parsed data and > parsing pointer and some combinational functions which generate match > function. all of these functions are very simple , usually only a few lines > of code , And they exists in the code of peasy for demonstration objective, > can be deleted, modified and replaced, according to the specific needs. > > In fact, instead of to be considered as a library or a tool, maybe it's > better to view peasy as a method, a way or some example to write parsers > mannually. This will allow for a greater degree of power. > > peasy provide specific examples to support the above conclusion. The most > obvious example is the common programming languages in order to adapt and > modify their parser tool to prevent left- recursive grammar . For example > python grammar (http://docs.python.org/2/reference/grammar.html), > javascript grammar ( > http://www-archive.mozilla.org/js/language/grammar14.html), while by > using peasy, all binary arithmetic expression can be condensed into a > simple left -recursive rules , left recursion elimination is not needed any > more. For expression parsing, in order to further improve time efficiency , > peasy also provides an example , has many different priorities for > computing the call stack example can skip directly to the highest priority > constant direct expression resolves to the final rather than, as is > currently common practice, the rules of writing as follows : or -> (or | | > and) | and; ...; add -> add + mul | mul; mul -> mul * atom | atom. Rare is > , peasy while achieving enhanced capabilities of these expressions can also > ensure that the linear time complexity of the parsing process. Want to know > how peasy achieve these results? Please give a glimpse to peasy on github( > https://github.com/chaosim/peasy). > > the first two of following links is the messages I posted when I started > to get the inspiration to start writing peasy. Because there are some of > other tasks to finish, intermediate interrupted for some time. Not long > ago, I use peasy to parse something in another projects , and I further > optimize the api of peasy, and which is now very simple and elegant, and I > am satisfied :) As for the most critical left recursive , and caching > algorithms , withstood the subsequent examples of the test , there is no > any change. Together, these examples also continues to demonstrate the > advantages of peasy . > > Finally, I need to mention the impact of language on thought: Without > coffeescript language, maybe it's difficult for me to have these ideas. in > coffeescript, the -> function notation and "assignment is an expression" > making special concise readable grammar. In python similar results can be > achieved, need to go through some twists and turns, and that is not so > natural as in coffeescript.There should exists no big obstacle in other > dynamic languages, but they is not as elegant as coffeescript tool. This > point is obvious with the difference between the javascript code of peasy > and its coffeescript source. > > messages I posted in google groups before: > > https://groups.google.com/forum/#!search/left$20recursive/comp.lang.javascript/GN7b9Tr6j98/DoiHzi9i77oJ > > https://groups.google.com/forum/#!search/%E5%B7%A6%E9%80%92%E5%BD%92/python-cn/Uqr-Xf3I3LM/oUd3Sj4HvxYJ > > > lr grammar and lex/yacc, bison, ply > http://en.wikipedia.org/wiki/LR_parser > http://dinosaur.compilertools.net/ > http://en.wikipedia.org/wiki/Yacc > http://www.dabeaz.com/ply/ > http://www.gnu.org/software/bison/ > http://zaach.github.io/jison/ > > LL grammar > http://en.wikipedia.org/wiki/LL_parser > http://www.antlr.org/ > > > some one asked for parser to handle left recursion on SO: > > http://stackoverflow.com/questions/4397039/any-peg-parser-capable-to-handle-left-recursion > > > peg and packrat > http://bford.info/packrat/ > http://en.wikipedia.org/wiki/Parsing_expression_grammar > http://cs.nyu.edu/rgrimm/xtc/rats-intro.html > http://java-source.net/open-source/parser-generators/rats! > > phoenix > > http://www.boost.org/doc/libs/1_55_0/libs/spirit/phoenix/doc/html/index.html > http://www.ontolinux.com/community/phoenix/Phoenix_Manual.pdf > > ometa: > Alessandro Warth describe a method to handle left recursion in peg grammar > in his PhD paper firstly. > After I have my idea about Peasy and algorith to handle left recursion, I > searched and found his paper, and found my algorithm is actully similar to > his method. > http://tinlizzie.org/ometa/ > Memoization in Top-Down Parsing > http://www.tinlizzie.org/~awarth/johnson.html > > https://github.com/alexwarth/ometa-js > > parsing under python > https://wiki.python.org/moin/LanguageParsing > > parser for javascript or in javascript > http://esprima.org/ > http://zaach.github.io/jison/ > > -- > -- > Job Board: http://jobs.nodejs.org/ > Posting guidelines: > https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines > > You received this message because you are subscribed to the Google > Groups "nodejs" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/nodejs?hl=en?hl=en > > --- > You received this message because you are subscribed to the Google Groups > "nodejs" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > > For more options, visit https://groups.google.com/groups/opt_out. > -- -- Job Board: http://jobs.nodejs.org/ Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines You received this message because you are subscribed to the Google Groups "nodejs" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/nodejs?hl=en?hl=en --- You received this message because you are subscribed to the Google Groups "nodejs" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
