[EMAIL PROTECTED] writes:
>
> Viewed as an information retrieval problem (not the best way to view it,
> but this is just an initial approach), one could then (1) create a taxonomy
> of drugs and a taxonomy of conditions, and (2) implement a concept-oriented
> (taxonomy-oriented) search of the corpus for something like:
> {beta_blocker} AND {cardiac_condition}
> where '{beat_blocker}' expands via the taxonomy to the set of terms (words,
> sequences of words, etc.) that "fall under" that "concept" in the drug
> taxonomy and similarly for '{cardiac_condition}' under the condition
> taxonomy.
>
I'm not sure that I understood everything, but there's one question coming
into my mind:
Why can't you just invert your taxonomy, that is expand each drug/condition
in the indexed documents to the list of the drug/condition classes it belongs
to? Instead of expanding the query.
> Verity didn't have these scaleability problems. The one suggestion I have
> seen so far in the responses that seems relevant to the problem is the
> suggestion to transform the taxonomy (taxonomies) into an Analyzer.
An analyzer in lucene is just the instance used in preparing a document
for indexing and a query term for searching. It does tokenisation of input
and any additional preprocessing you implement.
So your specific analyzer might take a drug as input and output all
relevant terms. That's how you could implement your aproach in lucene.
I cannot comment on combining thousands of terms in a query though.
Morus
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]