Hans Aberg wrote:
On 7 Mar 2006, at 18:42, Sylvain Schmitz wrote:
Too many states means less chances of merging concurrent stacks. One
could also point out that less inadequate states means less branching,
but the number of inadequate states avoided when using LR(1) instead of
LALR(1) is not very big, whereas the difference in automaton size is
exponential. Thus LR(1) is much more practicable if you merge states.
My guess is that this exponentiality of states refers to the difference
between LR(0) and LR(1).
It does.
There is a similar exponentiality in the NFA
to DFA translation. Then there are two methods around this problem: the
common, ignoring the issue, as typical regular expression grammars do
not hit this exponentiality. Alternatively, one can make a runtime
NFA-DFA translation, caching the states, though I have never seen such
an implementation.
So, as for LR(1), I guess for typical grammars there will be a blow- up
a couple of times of the number of states. If there is a problem with
exponential blowups, one might perhaps try the same approach, make a
grammar to non-deterministic automata translation, with a runtime
component computing the final deterministic push-down automaton states
at need, caching them.
I have never seen this kind of implementations either. I can see one
reason for that: when we construct the deterministic handle finding
automaton, we also get information on whether the LR parser is
deterministic or not, and on where the conflicts are.
One could perhaps use a LR(1) testing algorithm, which is supposed to
be much less costly than the complete parser generation, to check
whether the determinization will work or not---I don't know enough about
these tests to tell whether they also give you where the conflicts
occur---and do the NFA to DFA translation on parsing time like you suggest.
A second reason is simply that, for purely parsing purposes---that
is, ignoring the advantages you already mentioned, like correct prefix +
lookahead---, we do not need the full-blown LR(1) automaton; its
compressed form works just fine; even the weaker LALR(1) parser is good
enough in most cases. The space constrains experienced in the days
where LR parsing was invented did not leave much choice on the matter.
As for Bison, I think it should at some point in the future have a LR(1)
feature, so that one at least can experiment with it.
Indeed, you now have the choice.
--
Regards,
Sylvain
_______________________________________________
[email protected] http://lists.gnu.org/mailman/listinfo/help-bison