From the following link, it looks like we can call the Lexer to get tokens - 
independently of the parser.

http://www.antlr.org/wiki/display/ANTLR3/1.+Lexer

Here is the example from the above which gives me such a hope:

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;

public class MainLexer {
    public static void main(String[] args) {
        CharStream input = new ANTLRFileStream(args[0]);
        XMLLexer lexer = new XMLLexer(input);
        Token token;
        while ((token = lexer.nextToken())!=Token.EOF_TOKEN) {
        System.out.println("Token: "+token.getText());
        }
    } catch(Throwable t) {
        System.out.println("Exception: "+t);
        t.printStackTrace();
    }
    }
}

I don't know if CharStream or XMLLexer can take a String constructor or has a 
String factory, which is what we'd probably use within FOP.

Best Regards,
Jonathan S. Levinson

-----Original Message-----
From: Vincent Hennebert [mailto:vhenneb...@gmail.com] 
Sent: Thursday, October 08, 2009 5:15 AM
To: fop-dev@xmlgraphics.apache.org
Subject: Re: Regular expression use

Hi Jonathan,

Jonathan Levinson wrote:
> I'm sure someone has mentioned it already but what about the lexer support in 
> ANTLR?
> 
> http://www.antlr.org/wiki/display/ANTLR3/FAQ+-+Lexical+analysis
> 
> ANTLR is available under the BSD license, which seems to be one with no 
> strings attached:
> 
> http://www.antlr.org/license.html

Basically we’re back to the same discussion as about the parser
generator, this time at the lexer level.
http://markmail.org/thread/64rmyl7x4nyoxhh3

Among the tools mentioned in the above thread, it would be good to know
which ones allow to use the lexer independently of the parser. Unless we
decide to use both the lexer and parser anyway...


Vincent


> Best Regards,
> Jonathan S. Levinson
> 
> -----Original Message-----
> From: Vincent Hennebert [mailto:vhenneb...@gmail.com] 
> Sent: Wednesday, October 07, 2009 6:51 AM
> To: fop-dev@xmlgraphics.apache.org
> Subject: Re: Regular expression use
> 
> Hi Jonathan,
> 
> Jonathan Levinson wrote:
>> I noticed that if one is not careful in one's regular expression use,
>> the compilation for a regular expression can take minutes.  I'm not
>> talking about applying the pattern just compiling it!
>>
>>  
>>
>> Should regular expressions be avoided altogether and should one use
>> hand-crafted state machines for parsing, and tokenizing, or can regular
>> expressions be used as long as one is careful?  
> 
> I’d say, use regular expressions as long as they are not too complex.
> But I guess you’re mentioning that in the context of property parsing,
> in which case I don’t think regular expressions are the ultimate answer.
> A proper lexer is likely to be needed, either generated or written by
> hand. As the latter solution quickly becomes a maintenance nightmare,
> some lexer generator will probably be needed. Question remains, which
> one, and I’m not even sure there’s one that exists whose license is
> ASLv2-compatible. Plus there are some issues specific to property
> parsing, like shorthands (which should ideally re-use the parsers of the
> individual properties), sub-properties, etc.
> 
> 
> Vincent

Reply via email to