On Nov 13, 2007, at 11:59 AM, Steven D. Majewski wrote:

Lucene is great at finding documents, but not quite as good at finding
things IN documents. The index contains pointers to the terms, but they are pointers to a token in the parsed token stream, so to find a character index into a file, you have to (I believe) run the text thru the tokenizer again. ( But lucene API gives you access to everything, even if it's not simple or easy. I think there are some new features in the latest version that can make this sort of thing easier, but I haven't yet figured out how to use them. )


You can use Term Vectors to access the offset (and position) information for a document.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to