Hi Dave,
both types of key artifacts are a part of the default kim pipeline,
i.e. they are running in a standard GATE pipeline.
The key phrase extraction has been originally developed by Kalina
Bontcheva (USFD) and probably others at USFD. We took it some years
ago and worked together to extend it. It is now available in GATE -
check the creole plugins available and search for Keyphrase. It is
in /plugins/Keyphrase_Extraction_Algorithm
The module is based on TF.IDF, where the document frequency in IDF is
calculated on a pre-defined corpus during the training of the model.
You can limit the size of the model, the number of tokens in a phrase
(e.g. taking only phrases 2 to 3 tokens of length). During runtime you
can specify how many keyphrases you'd like to get per doc.
I'm pretty certain, although we've changed it, that you would be able
to get similar results easily with what is available in GATE.
The key entities identification components are derived from this one,
but they count on unique (for the entire corpus) identifier of
entities - in our case URIs of instances in a knowledge base. Without
it - you can not do the stats. I do not think that this functionality
is available in GATE - mainly because you do not have this unique ID
capability there - although with all the ontology extensions that the
community introduced in the recent years - i might be wrong - so
please check with the gate list.
all the best
borislav
On Mar 2, 2010, at 4:49 PM, Harrill, David C wrote:
To whom it may concern,
In working with the KIM tool, I came across the Document Detail
screen which displays both the Features associated with the document
as well as the document content. Within the Features section, there
exists two Features (KeyEntities and KeyPhrases). Are these two
features derived from the GATE application and if so using what GATE
plug-in? Otherwise how do these entities and phrases get populated
on this screen. I appreciate any information you can provide on this
matter and I look forward to hearing from you in regard to this
matter.
Thanks,
Dave
_______________________________________________
Kim-discussion mailing list
[email protected]
http://ontotext.com/mailman/listinfo/kim-discussion
_______________________________________________
Kim-discussion mailing list
[email protected]
http://ontotext.com/mailman/listinfo/kim-discussion