Hi all I posted a message asking about some of the API hooks which appear to be for excerpt generation to the user list a couple of days ago, and haven't heard anything back yet.
I'd like some feedback on an idea that I have to extend lucene to hold the extra information that it needs to stop me having to reparse the entire body text again to generate excerpts. Basically, to work out which sections of the text have the terms that generate the hit most frequently, I need the position of the terms in the document. This info, AFAICS, is already stored, but isn't accessible to someone from a Hits object. It would be nice to make it available somehow. Also, to be able to work out where those terms were in the original document, I'd like to store, and be able to retrieve, the start and end offset in the original field, for each term. This info is currently attached to the Term object, but AFAICS is not stored. Whether the best place to do that would be an extension to the existing segments, or in a separate segment file, I'm not sure. I haven't really spent enough time looking at the mechanics of the files yet. I'd really appreciate it if someone who understands how things work underneath could say "That sounds great, but try it like this" or "Don't do anything, we're currently implementing something similar" or even "You idiot, look at http://xyz/ to do that". Thanks Tom -- Tom Dunstan Mobile 0417 895 244 _______ Intec Consulting Group * PO Box 7012 Hutt Street * Level 1, 1 Hutt Street * Adelaide 5000 * Tel +61 8 8359 2332 * Fax +61 8 8359 2264 Email: [EMAIL PROTECTED] Website: www.intecgroup.com.au -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>