What about PathParser?
A parser has two parts - a tokenizer and a grammar.
Paths are composite - they have their own internal structure, and their
own grammar - so they are not tokens. c.f. expressions
Adding "/" as a token makes sense (if you look at the code you will see
that it is simply missing as are a few others)
If you want a parser with a grammar, use javacc. We already have one
for paths and its wrapped up in PathParser. It calls into ARQ parser and
returns a Path.
The same can be done for any part of SPARQL - call the SPARQL parser.
TokenizerText is not a general tokenizer - it does not do any common
prefix matching certain necessary cases for Turtle. It is a carefully
constructed around that use case for speed.
Handcoded parsers quickly get out of control. It's borderline for
Turtle which has a very simple grammar over the tokens.
For a x2 speed up, it seemed worth it.
Andy
On 03/01/17 20:32, Claude Warren wrote:
Should the TokenizerText parser be extended to parse paths?
so it could parse something like "<x:one>/<x:two>"
This would involve adding Path to Token as well some other changes, but
does it make sense?
Claude