On Mon, 2001-10-22 at 17:43, Doug Cutting wrote: > > I'm trying to implement this and should be able to contribute any > > succesful results, but I need to produce context on a per-field basis. > > How did the title ever get indexed as the title? Presumably you split the > document into fields when it was indexed. Similarly, if you re-tokenize > things a field at a time then you should always know which field you are in, > no?
I'm indexing HTML documents marked up with comments to indicate field boundaries. So I'd typically have: <!--field:section_title--> blurb <!--field:text--> more blurb and so on. The documents were indexed by looking for each field marker and then adding the subsequent lines to the relevant field. In order to obtain a generic solution for context generation are you suggesting I write a method that takes plain text, (eg, text form of document) and a query, and assumes the plain text is in the query's default field? This doesn't seem quite as useful as getContext(Hashset queryTerms, Reader originalDocument); which is what I was originally aiming towards. Regards, -- Lee Mallabone
