Isaac Dupree wrote: > apfelmus wrote: >> Mark T.B. Carroll wrote: >>> I've been playing with Text.Parsers.Frisby to see how it stacks against >>> other options and, while it's been great so far, I am finding that I >>> can't encode a grammar where what's acceptable depends on what's already >>> been parsed in some nontrivial way. >>> [...] >>> Is this supposed to not be possible in Frisby, or (quite likely) am I >>> missing something that allows me to? >> It's intentionally impossible. Frisby uses a dynamic programming >> approach that crucially depends on the fact that the grammar in question >> is context-free (actually something related, but the effect is the >> same). > > Is that dependence crucial? What if it gained Monad operations that just > weren't intended to be used very often, and maybe locally harmed > performance a little where they are used?
Now that you ask, I become unsure :) The actual details of packrat parsing are written down in B. Ford. Packrat Parsing: Simple, Powerful, Lazy, Linear Time. http://pdos.csail.mit.edu/~baford/packrat/icfp02/ There's a small section about "Monadic Packrat Parsing" but I'm not sure about its significance. The following discussion may shed light on this. First a different explanation of packrat parsing. It can be understood as a variant of the O(n^3) Cocke, Younger, Kasami parsing algorithm for context-free grammars (coincidentially recently discussed at http://article.gmane.org/gmane.comp.lang.haskell.cafe/22850). First, we rearrange the table gs i j nt = substring starting at position j of length i can be derived by nonterminal nt = function of gs i' j' nt' for j'>=j, j+i>=j'+i' and any nt' ¹ as fs j nt = [i | the substring starting at j with a length i is a derivation from the nonterminal nt] Then, packrat parsing basically throws out non-determinism (which changes the semantics of the context-free grammar): packrat j nt = minimum (fs j nt) = function of (packrat j' nt') for j'>=j and any nt' ¹ and that's about it. In the aforementioned paper, this table is very implicit but it's there: the indices i and j are present as memory pointers to different incarnations of the data structure Derivs. Also, the constructed values (i.e. values of type a for P s a) are stored in the table. Now, declaring a parser in the Frisby library builds up the table structure. Every newRule introduces a new non-terminal and an associated column in the table. This means that at least every non-terminal must be "side-effect free", i.e. the result (packrat j nt) may only depend on the substring starting at j but not on the results from previous parses. But it seems that the dependence itself indeed may incorporate context-sensitive behavior. In other words, you may decide freely how (packrat j nt) is calculated from (packrat j' nt'). In particular, you can choose the j' to incorporate based on parsing resulsts from them. Here's an artificial pseudo-code example: weird <- newRule $ do b <- parse boolean if b == True then parse number else parse date -- assumed helper stuff boolean <- newRule $ ... number <- newRule $ ... date <- newRule $ ... The decision of constructing the result of the nonterminal 'weird' from parsing a date or parsing a number depends on whether we parsed True or False before. In this case, there are no unexpected run-time penalities and it appears that this can already be implemented using the bare-hands machinery from the paper but that this cannot be implemented in Frisby (?). However, it's not possible to assign non-terminals to the parts that parse differently depending on context. In the example, we cannot factorize weird <- newRule $ do b <- parse boolean parse (helper b) helper <- newRule $ \b -> do if b == True then parse number else parse date and assign 'helper' a non-terminal. Somehow, this apparently doesn't work anyway because newRule doesn't accept functions as arguments. So, it seems that we can just turn (P s) into a monad without regret! Run-time penalities should only occur if we recurse on something that didn't get memoized via newRule. In a sense, is 'newRule' the only primitive (should probably get a different name like 'memoize') that makes packrat parsing different and faster than other monadic parsers? > BTW: (P s) should be an instance of Applicative (which is already > possible with Frisby's current code, just not there) (I prefer the > aesthetics of Frisby-applicative code to many of the combinators it > provides - I tried it a little sometime, not enough to send a darcs patch) Yes, I think too that it definitely should be made an instance of Applicative. For parsing, I prefer that to Monad anyway :) Regards, apfelmus ¹ The >= sometimes must be > to avoid a <<loop>>, but that's immaterial here. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe