RE: Context specific summary with the search term

Doug Cutting Tue, 23 Oct 2001 10:23:10 -0700

> From: Lee Mallabone [mailto:[EMAIL PROTECTED]]
> > 
> > How did the title ever get indexed as the title?  
> 
> I'm indexing HTML documents marked up with comments to indicate field
> boundaries. So I'd typically have:
> 
> <!--field:section_title-->
> blurb
> <!--field:text-->
> more blurb
> 
> and so on. The documents were indexed by looking for each field marker
> and then adding the subsequent lines to the relevant field.
> 
> In order to obtain a generic solution for context generation


If you're doing application-specific processing to extract fields from
documents, then a completely generic solution for extracting hit context
from documents is, by definition, impossible, since context extraction
requires field extraction.

> are you
> suggesting I write a method that takes plain text, (eg, text form of
> document) and a query, and assumes the plain text is in the query's
> default field?

I'm not exactly sure what you're proposing here, but, no, it doesn't sound
like something that I have suggested.

> This doesn't seem quite as useful as getContext(Hashset queryTerms,
> Reader originalDocument); which is what I was originally 
> aiming towards.

Such a method is easy to define if the Reader contains text from a single
field.  (Although you should probably pass in an Analyzer too.)  However if
you're expecting such a method to automatically divide the text into fields,
then things will be harder, since Lucene's model is that applications divide
documents into fields.  So you could write an application-specific version
that divides fields automatically, or, to use more generic code, you could
call such a generic method once for each field of your document, leaving
field extraction in application-specific code.  Does that make sense?

Doug

RE: Context specific summary with the search term

Reply via email to