The "formal" name for this stuff is "document filtering" or just "filtering". You can start on it, by looking at TREC, which had a filtering task for a number of years: http://trec.nist.gov/tracks.html

At any rate, one approach is to store your queries as Lucene documents, albeit short ones. Then, as others have said, you index new, incoming docs into a Memory Index. From that, you can extract the key terms which can then be used to come up with a Query to be run against your "query" index. The MoreLikeThis functionality should help in determining the important terms. Then, you need to decide how to handle dealing with the results. You probably don't want to route the document to each and every query that matches.

-Grant

On Nov 23, 2008, at 2:35 AM, Ian Holsman wrote:

Anshum wrote:
Hi Ian,
I guess that could be achieved if you write code to read the queries and
query for each document (using lucene).
Assuming that I got the question right! :)



yes.. that is one way, but probably not the most efficient one.

think of something like http://www.google.com/alerts, but instead of running once a day, it would run each time it sees a document. (as- it-happens mode)
and you would have a couple of million queries to run through.

regards
Ian
--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com

The facts expressed here belong to everybody, the opinions to me. The
distinction is yours to draw............


On Sun, Nov 23, 2008 at 9:27 AM, Ian Holsman <[EMAIL PROTECTED]> wrote:


Hi. apologies for the off-topic question.

I was wondering if anyone knew of a open source solution (or a pointer to
the algorithms)
that do the reverse of lucene.
By that I mean store a whole lot of queries, and run them against a
document to see which queries match it. (with a score etc)

I can see the case for this would be a news-article and several people
writing queries to get alerted if it matched a certain condition.


Regards
Ian

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]







---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ











---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to