nice suggestion about capping the highlighter's number of tokens - I'll add that in.
I agree, good suggestion.
I've had a quick look at your knowledgebase docs. Can't you split them at index time into multiple smaller docs using the <a name="xxx"> tags as doc boundaries? Each lucene document could then have a field with the URL [sourcedoc]#xxx, taking you to the relevant section in the source document.
Ideally, yes. Unfortunately, I do not control what our customers put into their knowledge base. Where boundaries are present that's actually quite a good suggestion - thanks!
Doug, do you believe the storing (as an option of course) of token offset information would be something that you'de accept as a contribution to the core of lucene? Does anyone else think that this would be beneficial information to have?
Regards,
Bruce Ritchie http://www.jivesoftware.com/
smime.p7s
Description: S/MIME Cryptographic Signature