Thanks for these suggestions.  The ideas of adding taxonomy-related terms
to the documents is an interesting one and bears some thought.  However, if
I have to pre-process the corpus to determine which terms to add, and then
to add them, it would seem that I've already accomplished my primary goal
and don't need an indexer and search engine.  Remember:  this is not really
an information retrieval application (with document-level granularity) that
is being contemplated here, but an information extraction and text/data
mining application (with "fact-level" granularity).   My hope was to
leverage a search engine, guided by taxonomies, to accomplish this at least
as a first cut.

I do find Morus's suggestion to do an "inverse expansion" of terms in the
index at indexing time to be very intriguing as well.  Perhaps it is also
what was meant by Ype's suggestion about adding stuff to the document
(meaning adding stuff to the index).

It appears I will also need to handle my own identification of matched
terms.  Verity, too, supports term highlighting -- but I am not at all
certain they return information concerning the exact string that triggered
the highlighted match.  Perhaps if the "inverse expansion" approach can be
made to work, it would eliminate this need.  And it might also eliminate
the need for the very large queries.  The details are unclear at this
point, but the possibilities are interesting.

The suggestion of Jython is also appreciated and I was considering it
already.  I have not used Jython yet, but have developed all of my
ontology/taxonomy/dictionary/thesaurus translation tools in Python (and
yes, I do know the differences among all of these).  I've even started to
develop some of my interface stuff in Tkinter, but if I'm going to go the
Java route I'll probably abandon that in favor of Swing.

Well, I can see that I have a bit of work to do.  I do have an
undergraduate and a graduate student here at NC State working with me, and
perhaps I can squeeze some of this work out of them :-).

--------------------------------------
Gary H. Merrill
Director and Principal Scientist, New Applications
Data Exploration Sciences
GlaxoSmithKline Inc.
(919) 483-8456




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to