On 17.03.2012 18:11, Philippe Sigaud wrote:
On Sat, Mar 17, 2012 at 10:09, Dmitry Olshansky<[email protected]>  wrote:

Ok, let's agree on fact that semantically
a* is :
As<- a As / a

and a*? is this:
As<- a / a As

Now that's local ;)

It's local, yes. But the pb is with   Expr<- A* B C D, when D fails.
PEG sequences don't backtrack.

I'd argue they do. As I see it as:
Expr <- As B C D / B C D
As <- A / A As
(or use an epsilon production for As, is it allowed in pegged ?)



I remember trying compile-time backtracking in another similar library
in D 1-2 years ago and getting lots of pb. I might add that in Pegged,
but I don't know the consequences. How do you do that in std.regex?



There are two fundamental ways to get around the problem
(snip)

Thanks for the explanations. Does std.regex implement this?


It does the second way that I haven't told about, and had hard time to implement in C-T :) And the simple version of what I presented (i.e. stack stuff) is used in CT-regex and as fallback in R-T. The problem with doing a bitmap memoization is that regex often used to _search_ things in large inputs. Some compromise and/or thresholds have to be estimated and found. Parsing on the contrary guaranties that the whole input is used or in error, hence bitmap is not wasted.


Now if we put aside all crap talk about mathematical purity and monads, and
you name it, a packrat is essentially this, a dynamic programming trick
applied to recursive top-down parsing. The trick could be extended even more
to avoid extra checks on input characters, but the essence is this
"memoization".

I plan to store indices anyway. I might add this in the future. A read
something on using PEGs to parse Java and C and there was no need to
do a complete memoization: storing the last parse result as a
temporary seemed to be enough.


Well even straight no-memorization is somehow enough, it's just a way to ensure no extra work is done. C & Java are not much of backtracky grammars.

(nice article btw, I learnt some about regexes)


Thanks, I hope it makes them more popular.
Might as well keep me busy fixing bugs :)

As people said on Reddit, you should make more whooping sounds about
CT-regex. That was what wowed me and pushed me to tackle CT-parsing
again.

It's just I'm also well aware of how much luck it (still) takes to fly ;)
The asserts that compare C-T vs R-T go off too often for my taste, other then this the memory usage during compile is too high. I recall once dmd had GC re-enabled for brief period, dmd used no more then ~64Mb doing real crazy ct-regex stuff.

OK, back to C-T parsing, I have one crazy idea that I can't get away from - add operator precedence grammar into the mix. From what I observe it should integrate into PEG smoothly. Then it would make "military-grade" hybrid that uses operator precedence parsing of expressions and the like. Few more tricks and it may beat some
existing parser generators.

See this post, where I tried to describe that idea early on:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=159562

I might catch spare time to go about adding it myself, the only tricky thing is to embed plain semantic actions, as AST generation would be more or less the same.

--
Dmitry Olshansky

Reply via email to