I have a couple quick questions...it might just be because I haven't looked at this in a week now (got pulled away onto some other stuff that had to take priority).
In the searching phase, I would run the search across all page documents, and then for each of those pages, do a search with PayloadSpanUtil.getPayloadsForQuery that made it so it only got payloads for each page at a time. The function returns a Collection of Payloads as far as I can tell, so is there any way of knowing which payloads go together? That is to say, if you were to do a search for "lucene rocks" on the page and it appeared 3 times, you would get back 6 payloads in total. Is there a quick way of knowing how to group them in the collection? Also, I need a way of seeing the words that came before or after a match on the page. The quick answer would be to store the next and previous word in the meta-data but this isn't scalable and would mean reindexing everything if I wanted to change the number of words stored. My thought was to have another index of words that store the text, page information, and the index of that word on the page. Then that index can be used in the meta-data in the first index, so if we know we got back word 5, we can run queries to get words 4 and 6 from the page. Does this make sense, or is there a way that would be better for performance? Thanks, Greg