Hi all,
I need the ability to match documents that have two terms that occur
within n paragraphs of each other. I had a look through the archives,
and although many people have explained ways to implement per-sentence
or per-paragraph indexing & searching, no seems to have tackeled this
one yet.
The only idea I can up up with is this:
I will index the entire document, as normal, but also index the
paragraphs seperately, numbering them accoring to the order they occur
in. (Storage space isn't an issue). When searching, I will first find
all documents that have both terms, using the full-content field.
Then I can get all the paragraphs that are part of that doc, and have
either of the search terms. I would still have to implement a bit of
logic to check which paragraphs have which term, and check the distance
between them (from the order info I kept when indexing).
I'm sure this would work, but it would be very slow. I can't help
feeling there's a better solution, that might involve inserting
paragraph tags into the content in a special field in my index, and
somehow using SpanQueries to find matches that have a given number of
paragraph marks in between... but I don't know if that's possible.
Does anyone have any ideas?
Thanks!
John B.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]