The "formal" name for this stuff is "document filtering" or just
"filtering". You can start on it, by looking at TREC, which had a
filtering task for a number of years: http://trec.nist.gov/tracks.html
At any rate, one approach is to store your queries as Lucene
documents, albeit short ones. Then, as others have said, you index
new, incoming docs into a Memory Index. From that, you can extract
the key terms which can then be used to come up with a Query to be run
against your "query" index. The MoreLikeThis functionality should
help in determining the important terms. Then, you need to decide how
to handle dealing with the results. You probably don't want to route
the document to each and every query that matches.
-Grant
On Nov 23, 2008, at 2:35 AM, Ian Holsman wrote:
Anshum wrote:
Hi Ian,
I guess that could be achieved if you write code to read the
queries and
query for each document (using lucene).
Assuming that I got the question right! :)
yes.. that is one way, but probably not the most efficient one.
think of something like http://www.google.com/alerts, but instead of
running once a day, it would run each time it sees a document. (as-
it-happens mode)
and you would have a couple of million queries to run through.
regards
Ian
--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com
The facts expressed here belong to everybody, the opinions to me. The
distinction is yours to draw............
On Sun, Nov 23, 2008 at 9:27 AM, Ian Holsman <[EMAIL PROTECTED]>
wrote:
Hi. apologies for the off-topic question.
I was wondering if anyone knew of a open source solution (or a
pointer to
the algorithms)
that do the reverse of lucene.
By that I mean store a whole lot of queries, and run them against a
document to see which queries match it. (with a score etc)
I can see the case for this would be a news-article and several
people
writing queries to get alerted if it matched a certain condition.
Regards
Ian
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------
Grant Ingersoll
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]