Jeremy Hinegardner wrote: > Hi all, > > I'm looking at useing Ferret for categorizing documents. > Essentially what I have are thousands of query rules that if a document > matches, then it belongs to the category that is associated with that > rule. Normally what we all do is have documents indexed and then run a > query against the index to get back the documents that matche the query. > > What I want to do is the inverse. I have thousands of queries and I > want to run all of them against one document at a time. The queries > that match the document essentially categorize the document into the > associated category. <snip> > Thought, comments, rants, raves, brainstorms? Random thought that might or might not work, depending on whether your queries are simple enough and how much data you want back: just invert the problem. Store the queries in Ferret, and treat your document as the query. Random example:
irb(main):015:0> index = Index::Index.new irb(main):016:0> index << "hat" irb(main):017:0> index << "fox" irb(main):018:0> doc = "the quick brown fox jumped over the lazy dog" irb(main):018:0> index.search_each(doc) { |id, score| puts index[id].load.to_yaml + score.to_s } --- !map:Ferret::Index::LazyDoc :id: fox 0.0425622686743736 => 1 I've got absolutely no idea how well the query parser will handle larger documents, but it's worth a try... -- Alex _______________________________________________ Ferret-talk mailing list Ferret-talk@rubyforge.org http://rubyforge.org/mailman/listinfo/ferret-talk