Jeremy Hinegardner wrote:
> Hi all,
> 
> I'm looking at useing Ferret for categorizing documents.  
> Essentially what I have are thousands of query rules that if a document
> matches, then it belongs to the category that is associated with that
> rule.  Normally what we all do is have documents indexed and then run a
> query against the index to get back the documents that matche the query.
> 
> What I want to do is the inverse.  I have thousands of queries and I
> want to run all of them against one document at a time.  The queries
> that match the document essentially categorize the document into the
> associated category.
<snip>
> Thought, comments, rants, raves, brainstorms?
Random thought that might or might not work, depending on whether your 
queries are simple enough and how much data you want back:  just invert 
the problem.  Store the queries in Ferret, and treat your document as 
the query.  Random example:

irb(main):015:0> index = Index::Index.new
irb(main):016:0> index << "hat"
irb(main):017:0> index << "fox"
irb(main):018:0> doc = "the quick brown fox jumped over the lazy dog"
irb(main):018:0> index.search_each(doc) { |id, score| puts
   index[id].load.to_yaml + score.to_s }
--- !map:Ferret::Index::LazyDoc
:id: fox
0.0425622686743736
=> 1

I've got absolutely no idea how well the query parser will handle larger 
documents, but it's worth a try...

-- 
Alex
_______________________________________________
Ferret-talk mailing list
Ferret-talk@rubyforge.org
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to