Hi all,

I need the ability to match documents that have two terms that occur within n paragraphs of each other. I had a look through the archives, and although many people have explained ways to implement per-sentence or per-paragraph indexing & searching, no seems to have tackeled this one yet.

The only idea I can up up with is this:

I will index the entire document, as normal, but also index the paragraphs seperately, numbering them accoring to the order they occur in. (Storage space isn't an issue). When searching, I will first find all documents that have both terms, using the full-content field.

Then I can get all the paragraphs that are part of that doc, and have either of the search terms. I would still have to implement a bit of logic to check which paragraphs have which term, and check the distance between them (from the order info I kept when indexing).

I'm sure this would work, but it would be very slow. I can't help feeling there's a better solution, that might involve inserting paragraph tags into the content in a special field in my index, and somehow using SpanQueries to find matches that have a given number of paragraph marks in between... but I don't know if that's possible.

Does anyone have any ideas?

Thanks!
John B.




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to