On Nov 13, 2007, at 11:59 AM, Steven D. Majewski wrote:
Lucene is great at finding documents, but not quite as good at finding
things IN documents. The index contains pointers to the terms, but
they are
pointers to a token in the parsed token stream, so to find a
character index
into a file, you have to (I believe) run the text thru the tokenizer
again.
( But lucene API gives you access to everything, even if it's not
simple or easy.
I think there are some new features in the latest version that can
make this
sort of thing easier, but I haven't yet figured out how to use
them. )
You can use Term Vectors to access the offset (and position)
information for a document.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]