sorry guys. sent this only to dave. fwding to the list as well
b
Begin forwarded message:
From: borislav popov <borislav.po...@ontotext.com>
Date: March 2, 2010 6:19:10 PM GMT+02:00
To: "Harrill, David C" <david.c.harr...@lmco.com>
Subject: Re: [Kim-discussion] KIM_Entity_Search
Dave
you came to an existential question. the simple answer is:
- if you need pure text analysis and you know what to do with the
results afterwards - you need only GATE.
- if you need annotations with respect to some structured data sets
- like knowledge bases, conceptual models ... that allows search and
navigation based on FTS, structure of the data set, co-occurrence
and a combination of those; if you need ways to obtain content
through rss feeds or focussed crawling; etc. - you need KIM.
So briefly - text analysis we have by default or produce for
customers is GATE compliant - and in most cases can be executed
within a pure GATE environment. GATE embedded is integral part of
KIM for modeling documents, annotations, corpora, etc.
The rest are content feeding services, semantic annotation, indexing
& search. The Web UI you know is just an example one - often
customers choose completely different route.
Beside this we provide customizations of everything in a kim-based
system - from crawlers, IE pipelines to background knowledge and
search. And support our customers as they go on.
for GATE only: also there you might stumble upon stuff that is not
present with the main package - like parallel processing, annotation
patterns based search, manual curation infrastructure
for these and more we have joint offerings with the GATE group and/
or other partners, like Matrixware.
So the answer is not that simple as the dependency is multi-layered.
There are also several products that are still not public in which
we heavily cooperate with GATE, but bits and pieces go into custom
projects for customers already.
i remember a project several years ago where we had to explain how
in GATE there is a KIM Client calling a KIM Server which is based on
GATE. phew.
So when you know what you would like to do - we hope this info will
help you take the right route and not waste time.
all the best
b
On Mar 2, 2010, at 5:46 PM, Harrill, David C wrote:
Borislav,
Thanks for your quick reply. I wanted to ask an additional question
pertaining to the overall KIM application. I have also been working
with GATE and have been attempting to differentiate the two
applications i.e. What overall capability that KIM provides that
GATE does not (with associative plug-ins). I have been reviewing
the vast documentation for both applications and have been unable
to come up with a clear cut difference (excluding the wonderful
search capabilities in KIM). Could you potentially provide me with
information on why an individual would primarily use GATE over KIM
or vice versa. Again, thanks for your assistance in regard to this
matter.
Dave
From: borislav popov [mailto:borislav.po...@ontotext.com]
Sent: Tuesday, March 02, 2010 10:28 AM
To: Harrill, David C
Cc: kim-discussion@ontotext.com
Subject: Re: [Kim-discussion] KIM_Entity_Search
Hi Dave,
both types of key artifacts are a part of the default
kim pipeline, i.e. they are running in a standard GATE pipeline.
The key phrase extraction has been originally developed by Kalina
Bontcheva (USFD) and probably others at USFD. We took it some years
ago and worked together to extend it. It is now available in GATE -
check the creole plugins available and search for Keyphrase. It is
in /plugins/Keyphrase_Extraction_Algorithm
The module is based on TF.IDF, where the document frequency in IDF
is calculated on a pre-defined corpus during the training of the
model. You can limit the size of the model, the number of tokens in
a phrase (e.g. taking only phrases 2 to 3 tokens of length). During
runtime you can specify how many keyphrases you'd like to get per
doc.
I'm pretty certain, although we've changed it, that you would be
able to get similar results easily with what is available in GATE.
The key entities identification components are derived from this
one, but they count on unique (for the entire corpus) identifier of
entities - in our case URIs of instances in a knowledge base.
Without it - you can not do the stats. I do not think that this
functionality is available in GATE - mainly because you do not have
this unique ID capability there - although with all the ontology
extensions that the community introduced in the recent years - i
might be wrong - so please check with the gate list.
all the best
borislav
On Mar 2, 2010, at 4:49 PM, Harrill, David C wrote:
To whom it may concern,
In working with the KIM tool, I came across the Document Detail
screen which displays both the Features associated with the
document as well as the document content. Within the Features
section, there exists two Features (KeyEntities and KeyPhrases).
Are these two features derived from the GATE application and if so
using what GATE plug-in? Otherwise how do these entities and
phrases get populated on this screen. I appreciate any information
you can provide on this matter and I look forward to hearing from
you in regard to this matter.
Thanks,
Dave
_______________________________________________
Kim-discussion mailing list
Kim-discussion@ontotext.com
http://ontotext.com/mailman/listinfo/kim-discussion
_______________________________________________
Kim-discussion mailing list
Kim-discussion@ontotext.com
http://ontotext.com/mailman/listinfo/kim-discussion