Re: Pegged, From EBNF to PEG

Dmitry Olshansky Sat, 17 Mar 2012 07:49:58 -0700

On 17.03.2012 18:11, Philippe Sigaud wrote:

On Sat, Mar 17, 2012 at 10:09, Dmitry Olshansky<[email protected]>  wrote:

Ok, let's agree on fact that semantically
a* is :
As<- a As / a

and a*? is this:
As<- a / a As

Now that's local ;)


It's local, yes. But the pb is with   Expr<- A* B C D, when D fails.
PEG sequences don't backtrack.


I'd argue they do. As I see it as:
Expr <- As B C D / B C D
As <- A / A As
(or use an epsilon production for As, is it allowed in pegged ?)

I remember trying compile-time backtracking in another similar library
in D 1-2 years ago and getting lots of pb. I might add that in Pegged,
but I don't know the consequences. How do you do that in std.regex?




There are two fundamental ways to get around the problem

(snip)

Thanks for the explanations. Does std.regex implement this?

It does the second way that I haven't told about, and had hard time toimplement in C-T :)And the simple version of what I presented (i.e. stack stuff) is usedin CT-regex and as fallback in R-T. The problem with doing a bitmapmemoization is that regex often used to _search_ things in large inputs.Some compromise and/or thresholds have to be estimated and found.Parsing on the contrary guaranties that the whole input is used or inerror, hence bitmap is not wasted.

Now if we put aside all crap talk about mathematical purity and monads, and
you name it, a packrat is essentially this, a dynamic programming trick
applied to recursive top-down parsing. The trick could be extended even more
to avoid extra checks on input characters, but the essence is this
"memoization".


I plan to store indices anyway. I might add this in the future. A read
something on using PEGs to parse Java and C and there was no need to
do a complete memoization: storing the last parse result as a
temporary seemed to be enough.

Well even straight no-memorization is somehow enough, it's just a way toensure no extra work is done. C & Java are not much of backtracky grammars.

(nice article btw, I learnt some about regexes)



Thanks, I hope it makes them more popular.
Might as well keep me busy fixing bugs :)


As people said on Reddit, you should make more whooping sounds about
CT-regex. That was what wowed me and pushed me to tackle CT-parsing
again.


It's just I'm also well aware of how much luck it (still) takes to fly ;)

The asserts that compare C-T vs R-T go off too often for my taste, otherthen this the memory usage during compile is too high.I recall once dmd had GC re-enabled for brief period, dmd used no morethen ~64Mb doing real crazy ct-regex stuff.

OK, back to C-T parsing, I have one crazy idea that I can't get awayfrom - add operator precedence grammar into the mix. From what I observeit should integrate into PEG smoothly. Then it would make"military-grade" hybrid that uses operator precedence parsing ofexpressions and the like. Few more tricks and it may beat some

existing parser generators.

See this post, where I tried to describe that idea early on:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=159562

I might catch spare time to go about adding it myself, the only trickything is to embed plain semantic actions, as AST generation would bemore or less the same.


--
Dmitry Olshansky

Re: Pegged, From EBNF to PEG

Reply via email to