Re: [nodejs] With this utility, everyone can easily invent new languages

Simeon Chaos Fri, 10 Jan 2014 20:51:58 -0800

aha, indeed. maybe I should blog it.

在 2014年1月11日星期六UTC+8上午9时40分31秒，Mark Hahn写道：
>
> Peasy looks interesting and it's on my list to look at, but this post was 
> too long to read.
>
>
> On Fri, Jan 10, 2014 at 5:37 PM, Simeon Chaos 
> <[email protected]<javascript:>
> > wrote:
>
>> How is peasy (https://github.com/chaosim/peasy) simpler and more 
>> powerful than other parser tools?
>>
>> Simpler and more powerful? Maybe it is a bit contrary to common sense. 
>> True or false? To affirm, please give it a glimpse at first. Because of 
>> being simple, you can comprehend it in a little while; because of being 
>> powerful, it will deserve your every minute.
>>
>> All along, the design of programming languages is a complex task. 
>> Because we need to understand the esoteric compiler theory and technology, 
>> and one of the most critical and very difficult part is to define the rules 
>> of the new language and to parse with them.To solve this problem, there 
>> have been many theories , techniques and tools . These tools can be roughly 
>> divided into two categories: one is parser generator, another is to write 
>> the parser by hand with or without a parser library.
>>
>> The parser generator generally based on some type of formal languages , 
>> by which the generator make the lexer and parser from some rules. Strong 
>> aspect of these tools is that they can produce the most efficient parser 
>> support for their own form of language types , according to the 
>> characteristics of formal language , generally ensures linear time 
>> complexity. Commonly divided into three types : LR technology , LL 
>> technology , PEG technology. LR technology language has shift/reduction , 
>> from the bottom up , the most right derived and other technical features , 
>> they can only support LR languages , more accurately, mainly LALR 
>> variants , plus some special treatment for the priority of operator. One of 
>> the most famous was undoubtedly the lex / yacc and its numerous subsequent 
>> variants ( such as bison, jison (javascript language ), ply (python lex / 
>> yacc) , etc. ) . LL technology has a top-down form of the language , 
>> recursive descent , the most left-dereived and other technical features , 
>> although the concept is easier to understand than the LR language , but 
>> because LL covers less languages, so not used as universal as lex / yacc. 
>> Representative products are antlr. PEG technology refers to parsing 
>> expression grammar parser generator. peg is a kind of formal grammar 
>> particularly suited to be parsed, currently there are already many tools 
>> appear. The most common method is packrat  algorithm based implementations. 
>> Such as rats, ometa (there are many versions , such as ometa / squeak, 
>> ometa-js, pyMeta etc. ), pyPEG under python, peg.js  under javascript, 
>> treetop and citrus under ruby, and so on.
>>
>> No matter what type of parser generator , the method to use them is to 
>> design rules for these tools , and then the tools generate parser. This is 
>> the difficult aspects. First, the object program is generated based on the 
>> state transfer and stack process, it is difficult to understand, debug and 
>> modify. Second, we must understand parser theory and technology of these 
>> tools is based on, the rules used by these tools is actually a 
>> domain-specific language , their expressivenes is very limited, while we 
>> must understand and become familiar with the new language in order to 
>> design rules. Third, we must embed certain processing ( abstract syntax 
>> tree structure , semantics, error reporting and handling , etc) in the form 
>> of object parser language  into lex/parser rules. Difficulties in these 
>> areas prevent the most programmers to easily use this type of tool.
>>     
>> Meanwhile, people have also been pursuing a more relaxed and flexible 
>> approach to write parsers. Most of these methods produce tools can be 
>> classified as combinational parser library , or peg grammar-based peg 
>> library . Use of such libraries, programmers can use their own language in 
>> daily use to design a new universal language , parsing text, so they are 
>> used more freguently. Representative works is parsec under haskell language 
>>  which maybe is the most mature and powerful. However, because haskell is 
>> too academic, and parsec is not universally popular. c + + language has 
>> boost phoenix library, probably because it depends on the c++ template, an 
>> advanced language features, it has not been widely used too. Python has 
>> pyparsing, which is used by many users. For specific questions , there are 
>> many applications do not use any tools or libraries, but manually write the 
>> entire parser , for example: cpython implementation, esprima for 
>> javascript. However, unfortunately,  in their grammar and parsing of these 
>> tools, there are two obvious difficulties not been solved: the first is 
>> left recursive grammar, because left-recursive function will lead to an 
>> unconditional infinite recursion. The second is the parsing efficiency. In 
>> order to obtain a linear time complexity , we should only parse same syntax 
>> component only once at any position. To achieve the latter, the result of 
>> parsing is required. Meanwhile , the parser also need to be considered in 
>> conjunction to produce abstract syntax trees , semantic processing , error 
>> handling , etc., and these libraries maybe try to keep away from some of 
>> the problems (in the combined libraries left recursion are generally 
>> fobidden), or in order to solve these problems the program becomes very 
>> complex , difficult to understand , use, modify and extend.
>>
>> Can be seen from the above description , parsing is a very common 
>> programming needs , but also has a very high technical difficulty , 
>> resulting in a large number of theories, technique, and tools you have seen 
>> above, in this regard a variety of technical papers, code and tools can be 
>> used simply to describe the voluminous , but still did not achieve the 
>> desired results .
>>
>>
>> with peasy, a complete and elegant solution to all these problems emerge 
>> for the first time. On one hand you can say peasy the most simple and 
>> elegant , but also has the most powerful and adaptability , and does not 
>> need to sacrifice speed and efficiency.
>>  Its characteristics are as follows:
>>
>> * for the first time, peasy provides a simple and complete solution to 
>> the problem of left recursion in the hand-written recursive descent parser 
>> manner, no matter direct or indirect left-recursive rule . To handle left 
>> recursive in peasy, only one function is needed (only 20 lines of code in 
>> coffeescript).
>> * peasy allows programmers to write the parser with common programming 
>> language, and the parser and the input to the grammar rules are similar to 
>> the parser generator , both concise and readable , easy to modify and 
>> extend.
>> * the simple and flexible caching mechanism in peasy can improve the time 
>> efficiency of parsing, we can maintain a linear time for liner time 
>> complexity grammar by caching. For complex grammar and data , we can also 
>> improve time efficiency by caching mechanism, and avoide the exponential, 
>> cube or square time complexity like some other common parsing algorithm. 
>> peasy's cache implementation is very simple, only one function (only ten 
>> lines of code under coffeescript  ) .
>> * the entire peasy implementation is very simple. only two hundred lines 
>> of code under cofeescript. The concept is simple. Only two kind of 
>> components: some match functions which operate on the parsed data and 
>> parsing pointer and some combinational functions which generate match 
>> function. all of these functions are very simple , usually only a few lines 
>> of code , And they exists in the code of peasy for demonstration objective, 
>> can be deleted, modified and replaced, according to the specific needs.
>>
>> In fact, instead of to be considered as a library or a tool, maybe it's 
>> better to view peasy as a method, a way or some example to write parsers 
>> mannually. This will allow for a greater degree of power.
>>
>> peasy provide specific examples to support the above conclusion. The most 
>> obvious example is the common programming languages in order to adapt and 
>> modify their parser tool to prevent left- recursive grammar . For example 
>> python grammar (http://docs.python.org/2/reference/grammar.html), 
>> javascript grammar (
>> http://www-archive.mozilla.org/js/language/grammar14.html), while by 
>> using peasy, all binary arithmetic expression can be condensed into a 
>> simple left -recursive rules , left recursion elimination is not needed any 
>> more. For expression parsing, in order to further improve time efficiency , 
>> peasy also provides an example , has many different priorities for 
>> computing the call stack example can skip directly to the highest priority 
>> constant direct expression resolves to the final rather than, as is 
>> currently common practice, the rules of writing as follows : or -> (or | | 
>> and) | and; ...; add -> add + mul | mul; mul -> mul * atom | atom. Rare is 
>> , peasy while achieving enhanced capabilities of these expressions can also 
>> ensure that the linear time complexity of the parsing process. Want to know 
>> how peasy achieve these results? Please give a glimpse to peasy on github(
>> https://github.com/chaosim/peasy).
>>
>> the first two of following links is the messages I posted when I started 
>> to get the inspiration to start writing peasy. Because there are some of 
>> other tasks to finish, intermediate interrupted for some time. Not long 
>> ago, I use peasy to parse something in another projects , and I further 
>> optimize the api of peasy, and which is now very simple and elegant, and I 
>> am satisfied :) As for the most critical left recursive , and caching 
>> algorithms , withstood the subsequent examples of the test , there is no 
>> any change. Together, these examples also continues to demonstrate the 
>> advantages of peasy .
>>
>> Finally, I need to mention the impact of language on thought: Without 
>> coffeescript language, maybe it's difficult for me to have these ideas. in 
>> coffeescript,  the -> function notation and "assignment is an expression" 
>> making special concise readable grammar. In python similar results can be 
>> achieved, need to go through some twists and turns, and that is not so 
>> natural as in coffeescript.There should exists no big obstacle in other 
>> dynamic languages, but they is not as elegant as coffeescript tool. This 
>> point is obvious with the difference between the javascript code of peasy 
>> and its coffeescript source.
>>
>> messages I posted in google groups before:
>>
>> https://groups.google.com/forum/#!search/left$20recursive/comp.lang.javascript/GN7b9Tr6j98/DoiHzi9i77oJ
>>
>> https://groups.google.com/forum/#!search/%E5%B7%A6%E9%80%92%E5%BD%92/python-cn/Uqr-Xf3I3LM/oUd3Sj4HvxYJ
>>
>>
>> lr grammar and lex/yacc, bison, ply
>> http://en.wikipedia.org/wiki/LR_parser
>> http://dinosaur.compilertools.net/
>> http://en.wikipedia.org/wiki/Yacc
>> http://www.dabeaz.com/ply/
>> http://www.gnu.org/software/bison/
>> http://zaach.github.io/jison/
>>
>> LL grammar
>> http://en.wikipedia.org/wiki/LL_parser
>> http://www.antlr.org/
>>
>>
>> some one asked for parser to handle left recursion on SO:
>>
>> http://stackoverflow.com/questions/4397039/any-peg-parser-capable-to-handle-left-recursion
>>
>>
>> peg and packrat
>> http://bford.info/packrat/
>> http://en.wikipedia.org/wiki/Parsing_expression_grammar
>> http://cs.nyu.edu/rgrimm/xtc/rats-intro.html
>> http://java-source.net/open-source/parser-generators/rats!
>>
>> phoenix 
>>
>> http://www.boost.org/doc/libs/1_55_0/libs/spirit/phoenix/doc/html/index.html
>> http://www.ontolinux.com/community/phoenix/Phoenix_Manual.pdf
>>
>> ometa：
>> Alessandro Warth describe a method to handle left recursion in peg 
>> grammar in his PhD paper firstly.
>> After I have my idea about Peasy and algorith to handle left recursion, I 
>> searched and found his paper, and found my algorithm is actully similar to 
>> his method.
>> http://tinlizzie.org/ometa/
>> Memoization in Top-Down Parsing
>> http://www.tinlizzie.org/~awarth/johnson.html
>>
>> https://github.com/alexwarth/ometa-js
>>
>> parsing under python
>> https://wiki.python.org/moin/LanguageParsing
>>
>> parser for javascript or in javascript
>> http://esprima.org/
>> http://zaach.github.io/jison/
>>
>> -- 
>> -- 
>> Job Board: http://jobs.nodejs.org/
>> Posting guidelines: 
>> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
>>
>> You received this message because you are subscribed to the Google
>> Groups "nodejs" group.
>> To post to this group, send email to [email protected]<javascript:>
>> To unsubscribe from this group, send email to
>> [email protected] <javascript:>
>> For more options, visit this group at
>> http://groups.google.com/group/nodejs?hl=en?hl=en
>>  
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "nodejs" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>>
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>


-- 
-- 
Job Board: http://jobs.nodejs.org/
Posting guidelines: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: [nodejs] With this utility, everyone can easily invent new languages

Reply via email to