On Sat, Mar 26, 2011 at 1:21 PM, Jason Doege <[email protected]> wrote:
> Re: Scanner-less parsing > > The Parse::RecDescent module for Perl5 implements parsers without a > separate scanner and is what comes to mind when I hear the phrase > scanner-less. If you were to retain a scanner, I think the > characteristic that could provide the same function is to provide > context to the scanner so that when you go to get the next token, the > scanner only considers the type of token next expected in the current > alternative in the production. This way one could have multiple tokens > that might all match some text (but not others) and have the context of > the production resolve which one it was, (so long as it matched, of > course.) > > For instance, I might want to have separate token types for binary, hex > and decimal digits, but a scanner can not tell which of the three it is > if the input is '0' or '1'. Hex overlaps with decimal for 0-9 and > overlaps with binary for 0-1 and potentially 'x', 'X', 'z' and 'Z' for > some implementations. There absolutely are other ways to handle this, > but there is a great deal of flexibility that comes from permitting > context to guide the scanner. > > Having to work through the unambiguity of lexer patterns was something > that was unexpected when I recently began working with ANTLR. I suspect > that this would not be the case for someone who is more accustomed to > using Lex/Yacc or comes from a more traditional or academic > parser-building background. > > Best regards, > Jason Doege > > On 3/25/2011 9:19 AM, The Researcher wrote: > > > > > > On Thu, Mar 24, 2011 at 2:32 PM, The Researcher<[email protected] > >wrote: > > > >> > >> On Thu, Mar 24, 2011 at 1:23 PM, Terence Parr<[email protected]> > wrote: > >> > >>> added > >>> > >>> * Tree parser error handling should skip subtrees not nodes; these are > >>> programming errors not input errors. The flat stream makes it hard to > >>> resync. > >>> > >>> Ter > >>> On Mar 24, 2011, at 2:07 AM, Iztok Kavkler wrote: > >>> > >>>>> Howdy, I'm going to start augmenting ANTLR v3 significantly to create > >>> v4. The goal is backward compatibility; any new functionality, of > course, > >>> will require altering or augmenting your grammars to take advantage of > it. > >>> Here is my potential list of updates: > >>>>> http://www.antlr.org/wiki/display/ANTLR4/ANTLR+v4+Wish+list > >>>>> > >>>>> Anything to add or comment on? > >>>>> > >>>>> Ter > >>>>> > >>>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest > >>>>> Unsubscribe: > >>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address > >>>> A new error recovery mode for tree parsing: > >>>> When parsing ASTs, the ordinary error recovery strategies based on > token > >>>> deletion/insertion are completely useless, because there are no > man-made > >>>> syntax errors. In my experience, what you really want to do is the > >>>> following: assume that you have an error handler attached to some rule > >>>> and an error happens somewhere in the subtree of the node parsed by > that > >>>> rule. When the handler catches an error, the parser must skip the > >>>> remainder of that subtree, otherwise the parser position is not > >>>> consistent with the grammar position anymore. In AST implementations > >>>> that are based on pointers between nodes this happens automatically, > but > >>>> Antlr's representation as a flat list of nodes with UP and DOWN tokens > >>>> makes it requires some work - the parser has to keep track of the > >>>> current node's depth and skip the appropriate number of UP nodes > >>>> whenever an error is caught. > >>>> > >>>> Iztok > >>>> > >>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest > >>>> Unsubscribe: > >>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address > >>> > >>> > >>> List: http://www.antlr.org/mailman/listinfo/antlr-interest > >>> Unsubscribe: > >>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address > >>> > >> > >> 1. If my concept of scannerless parsing is the same as yours, then in > the > >> generated code for a rule allow the true for "do {<rule code> } > >> while(true)" to be an attribute of the rule, i.e exit. Obviously the > value > >> would be true unless changed by a user.This would allow the user to have > >> control of when to exit the rule. By turning true into a attribute of > the > >> rule, this allows for more control than gated semantic predicates. > >> > >> Based on by concept of scannerless parsing, there is no lexer and the > >> parser drives the reading of the tokens from the intput stream. The > input > >> stream does not generate the tokens ahead of time but only when needed. > In a > >> quick proof of concept I had the token type passed from the parser as a > >> generic parameter, allowing the redefinition of the token returned by > the > >> token stream. There were no pre-defined tokens values; they were > dynamically > >> generated.To get the proof of concept to work required having a > >> cross-reference table between token types and token values. > >> > >> 2. If ANTLR 4 will allow the reading of binary data streams, then please > >> don't put char and line pos in a base class. There could be one > inherited > >> classes that defines line and char pos, and another inherited class that > >> defines offset. > >> > >> Thanks > >> > >> Eric > >> > >> > > After finding Scannerless Generalized LR (SGLR), which I believe is > closer > > to your meaning, my concept of scannerless parsing is different enough > that > > the reference should should be disregarded. I still submit the request > for a > > rule to have an exit attribute. > > > > Thanks, Eric > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > > Unsubscribe: > http://www.antlr.org/mailman/options/antlr-interest/your-email-address > > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: > http://www.antlr.org/mailman/options/antlr-interest/your-email-address > Jason Thanks for the info. Ter, with regards to wishes for ANTLR 4, I don't know how long you will be leaving the pipeline open, so I am sending in my wishes before the pipeline closes and the wishes aren't as polished as they should be. More in line with what I am wishing is that ANTLR 4 have more features accessible from the grammar for doing research; possibly via a research mode. The ablility to manipulate the state machine for each rule is desired, along with the ability to manipulte the trees in a fashion simular to PROLOG. I know you have been considering LLVM, for which I would truly like to see in ANTLR 4, and that may be the ingress I seek. I wish I could flesh out the details more, but maybe others can hop onto this suggestion. Thanks, Eric List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
