Hi,

is there a Tokenizer in Lucene, that tokenizes XML correctly?

I.e. that one gets from the following XML:
<span>this is <span attr="foo">example</span>text.</span>

Tokens (or similar):
<span> | this | is | <span attr="foo"> | example | </span> | text. | </span>

Or would i need to write such a Tokenizer myself?

regards
Christoph Hermann

-- 
Christoph Hermann
Institut für Informatik
Tel: +49 761-203-8171 Fax: +49 761-203-8162
e-mail: herm...@informatik.uni-freiburg.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to