I have a couple quick questions...it might just be because I haven't looked
at this in a week now (got pulled away onto some other stuff that had to
take priority).

In the searching phase, I would run the search across all page documents,
and then for each of those pages, do a search with
PayloadSpanUtil.getPayloadsForQuery that made it so it only got payloads for
each page at a time.  The function returns a Collection of Payloads as far
as I can tell, so is there any way of knowing which payloads go together?
That is to say, if you were to do a search for "lucene rocks" on the page
and it appeared 3 times, you would get back 6 payloads in total.  Is there a
quick way of knowing how to group them in the collection?

Also, I need a way of seeing the words that came before or after a match on
the page.  The quick answer would be to store the next and previous word in
the meta-data but this isn't scalable and would mean reindexing everything
if I wanted to change the number of words stored.  My thought was to have
another index of words that store the text, page information, and the index
of that word on the page.  Then that index can be used in the meta-data in
the first index, so if we know we got back word 5, we can run queries to get
words 4 and 6 from the page.  Does this make sense, or is there a way that
would be better for performance?

Thanks,
Greg

Reply via email to