thanks for the ideas gang! I'll see how things progress soon. first task: rebuild v3 antlr parser in v3 and jam into v3 code base; it's in v2 now.
then might attack lexer stuff. Ter On Mar 26, 2011, at 11:36 AM, The Researcher wrote: > On Sat, Mar 26, 2011 at 1:21 PM, Jason Doege <[email protected]> wrote: > >> Re: Scanner-less parsing >> >> The Parse::RecDescent module for Perl5 implements parsers without a >> separate scanner and is what comes to mind when I hear the phrase >> scanner-less. If you were to retain a scanner, I think the >> characteristic that could provide the same function is to provide >> context to the scanner so that when you go to get the next token, the >> scanner only considers the type of token next expected in the current >> alternative in the production. This way one could have multiple tokens >> that might all match some text (but not others) and have the context of >> the production resolve which one it was, (so long as it matched, of >> course.) >> >> For instance, I might want to have separate token types for binary, hex >> and decimal digits, but a scanner can not tell which of the three it is >> if the input is '0' or '1'. Hex overlaps with decimal for 0-9 and >> overlaps with binary for 0-1 and potentially 'x', 'X', 'z' and 'Z' for >> some implementations. There absolutely are other ways to handle this, >> but there is a great deal of flexibility that comes from permitting >> context to guide the scanner. >> >> Having to work through the unambiguity of lexer patterns was something >> that was unexpected when I recently began working with ANTLR. I suspect >> that this would not be the case for someone who is more accustomed to >> using Lex/Yacc or comes from a more traditional or academic >> parser-building background. >> >> Best regards, >> Jason Doege >> >> On 3/25/2011 9:19 AM, The Researcher wrote: >>> >>> >>> On Thu, Mar 24, 2011 at 2:32 PM, The Researcher<[email protected] >>> wrote: >>> >>>> >>>> On Thu, Mar 24, 2011 at 1:23 PM, Terence Parr<[email protected]> >> wrote: >>>> >>>>> added >>>>> >>>>> * Tree parser error handling should skip subtrees not nodes; these are >>>>> programming errors not input errors. The flat stream makes it hard to >>>>> resync. >>>>> >>>>> Ter >>>>> On Mar 24, 2011, at 2:07 AM, Iztok Kavkler wrote: >>>>> >>>>>>> Howdy, I'm going to start augmenting ANTLR v3 significantly to create >>>>> v4. The goal is backward compatibility; any new functionality, of >> course, >>>>> will require altering or augmenting your grammars to take advantage of >> it. >>>>> Here is my potential list of updates: >>>>>>> http://www.antlr.org/wiki/display/ANTLR4/ANTLR+v4+Wish+list >>>>>>> >>>>>>> Anything to add or comment on? >>>>>>> >>>>>>> Ter >>>>>>> >>>>>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest >>>>>>> Unsubscribe: >>>>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address >>>>>> A new error recovery mode for tree parsing: >>>>>> When parsing ASTs, the ordinary error recovery strategies based on >> token >>>>>> deletion/insertion are completely useless, because there are no >> man-made >>>>>> syntax errors. In my experience, what you really want to do is the >>>>>> following: assume that you have an error handler attached to some rule >>>>>> and an error happens somewhere in the subtree of the node parsed by >> that >>>>>> rule. When the handler catches an error, the parser must skip the >>>>>> remainder of that subtree, otherwise the parser position is not >>>>>> consistent with the grammar position anymore. In AST implementations >>>>>> that are based on pointers between nodes this happens automatically, >> but >>>>>> Antlr's representation as a flat list of nodes with UP and DOWN tokens >>>>>> makes it requires some work - the parser has to keep track of the >>>>>> current node's depth and skip the appropriate number of UP nodes >>>>>> whenever an error is caught. >>>>>> >>>>>> Iztok >>>>>> >>>>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest >>>>>> Unsubscribe: >>>>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address >>>>> >>>>> >>>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest >>>>> Unsubscribe: >>>>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address >>>>> >>>> >>>> 1. If my concept of scannerless parsing is the same as yours, then in >> the >>>> generated code for a rule allow the true for "do {<rule code> } >>>> while(true)" to be an attribute of the rule, i.e exit. Obviously the >> value >>>> would be true unless changed by a user.This would allow the user to have >>>> control of when to exit the rule. By turning true into a attribute of >> the >>>> rule, this allows for more control than gated semantic predicates. >>>> >>>> Based on by concept of scannerless parsing, there is no lexer and the >>>> parser drives the reading of the tokens from the intput stream. The >> input >>>> stream does not generate the tokens ahead of time but only when needed. >> In a >>>> quick proof of concept I had the token type passed from the parser as a >>>> generic parameter, allowing the redefinition of the token returned by >> the >>>> token stream. There were no pre-defined tokens values; they were >> dynamically >>>> generated.To get the proof of concept to work required having a >>>> cross-reference table between token types and token values. >>>> >>>> 2. If ANTLR 4 will allow the reading of binary data streams, then please >>>> don't put char and line pos in a base class. There could be one >> inherited >>>> classes that defines line and char pos, and another inherited class that >>>> defines offset. >>>> >>>> Thanks >>>> >>>> Eric >>>> >>>> >>> After finding Scannerless Generalized LR (SGLR), which I believe is >> closer >>> to your meaning, my concept of scannerless parsing is different enough >> that >>> the reference should should be disregarded. I still submit the request >> for a >>> rule to have an exit attribute. >>> >>> Thanks, Eric >>> >>> List: http://www.antlr.org/mailman/listinfo/antlr-interest >>> Unsubscribe: >> http://www.antlr.org/mailman/options/antlr-interest/your-email-address >>> >> >> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest >> Unsubscribe: >> http://www.antlr.org/mailman/options/antlr-interest/your-email-address >> > > > Jason Thanks for the info. > > Ter, with regards to wishes for ANTLR 4, I don't know how long you will be > leaving the pipeline open, so I am sending in my wishes before the pipeline > closes and the wishes aren't as polished as they should be. > > More in line with what I am wishing is that ANTLR 4 have more features > accessible from the grammar for doing research; possibly via a research > mode. The ablility to manipulate the state machine for each rule is desired, > along with the ability to manipulte the trees in a fashion simular to > PROLOG. > > I know you have been considering LLVM, for which I would truly like to see > in ANTLR 4, and that may be the ingress I seek. > > I wish I could flesh out the details more, but maybe others can hop onto > this suggestion. > > Thanks, Eric > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: > http://www.antlr.org/mailman/options/antlr-interest/your-email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
