Re: TokenizerText parser and Path

Andy Seaborne Wed, 04 Jan 2017 01:02:41 -0800

What about PathParser?

A parser has two parts - a tokenizer and a grammar.

Paths are composite - they have their own internal structure, and theirown grammar - so they are not tokens. c.f. expressions

Adding "/" as a token makes sense (if you look at the code you will seethat it is simply missing as are a few others)

If you want a parser with a grammar, use javacc. We already have onefor paths and its wrapped up in PathParser. It calls into ARQ parser andreturns a Path.


The same can be done for any part of SPARQL - call the SPARQL parser.

TokenizerText is not a general tokenizer - it does not do any commonprefix matching certain necessary cases for Turtle. It is a carefullyconstructed around that use case for speed.

Handcoded parsers quickly get out of control. It's borderline forTurtle which has a very simple grammar over the tokens.


For a x2 speed up, it seemed worth it.

        Andy

On 03/01/17 20:32, Claude Warren wrote:

Should the TokenizerText parser be extended to parse paths?

so it could parse something like "<x:one>/<x:two>"

This would involve adding Path to Token as well some other changes, but
does it make sense?

Claude

Re: TokenizerText parser and Path

Reply via email to