sorry guys. sent this only to dave. fwding to the list as well
b

Begin forwarded message:

From: borislav popov <borislav.po...@ontotext.com>
Date: March 2, 2010 6:19:10 PM GMT+02:00
To: "Harrill, David C" <david.c.harr...@lmco.com>
Subject: Re: [Kim-discussion] KIM_Entity_Search

Dave
        you came to an existential question. the simple answer is:
- if you need pure text analysis and you know what to do with the results afterwards - you need only GATE. - if you need annotations with respect to some structured data sets - like knowledge bases, conceptual models ... that allows search and navigation based on FTS, structure of the data set, co-occurrence and a combination of those; if you need ways to obtain content through rss feeds or focussed crawling; etc. - you need KIM.

So briefly - text analysis we have by default or produce for customers is GATE compliant - and in most cases can be executed within a pure GATE environment. GATE embedded is integral part of KIM for modeling documents, annotations, corpora, etc. The rest are content feeding services, semantic annotation, indexing & search. The Web UI you know is just an example one - often customers choose completely different route. Beside this we provide customizations of everything in a kim-based system - from crawlers, IE pipelines to background knowledge and search. And support our customers as they go on.

for GATE only: also there you might stumble upon stuff that is not present with the main package - like parallel processing, annotation patterns based search, manual curation infrastructure for these and more we have joint offerings with the GATE group and/ or other partners, like Matrixware.

So the answer is not that simple as the dependency is multi-layered. There are also several products that are still not public in which we heavily cooperate with GATE, but bits and pieces go into custom projects for customers already.

i remember a project several years ago where we had to explain how in GATE there is a KIM Client calling a KIM Server which is based on GATE. phew.

So when you know what you would like to do - we hope this info will help you take the right route and not waste time.

all the best
b


On Mar 2, 2010, at 5:46 PM, Harrill, David C wrote:

Borislav,

Thanks for your quick reply. I wanted to ask an additional question pertaining to the overall KIM application. I have also been working with GATE and have been attempting to differentiate the two applications i.e. What overall capability that KIM provides that GATE does not (with associative plug-ins). I have been reviewing the vast documentation for both applications and have been unable to come up with a clear cut difference (excluding the wonderful search capabilities in KIM). Could you potentially provide me with information on why an individual would primarily use GATE over KIM or vice versa. Again, thanks for your assistance in regard to this matter.

Dave

From: borislav popov [mailto:borislav.po...@ontotext.com]
Sent: Tuesday, March 02, 2010 10:28 AM
To: Harrill, David C
Cc: kim-discussion@ontotext.com
Subject: Re: [Kim-discussion] KIM_Entity_Search

Hi Dave,
both types of key artifacts are a part of the default kim pipeline, i.e. they are running in a standard GATE pipeline. The key phrase extraction has been originally developed by Kalina Bontcheva (USFD) and probably others at USFD. We took it some years ago and worked together to extend it. It is now available in GATE - check the creole plugins available and search for Keyphrase. It is in /plugins/Keyphrase_Extraction_Algorithm The module is based on TF.IDF, where the document frequency in IDF is calculated on a pre-defined corpus during the training of the model. You can limit the size of the model, the number of tokens in a phrase (e.g. taking only phrases 2 to 3 tokens of length). During runtime you can specify how many keyphrases you'd like to get per doc.

I'm pretty certain, although we've changed it, that you would be able to get similar results easily with what is available in GATE.

The key entities identification components are derived from this one, but they count on unique (for the entire corpus) identifier of entities - in our case URIs of instances in a knowledge base. Without it - you can not do the stats. I do not think that this functionality is available in GATE - mainly because you do not have this unique ID capability there - although with all the ontology extensions that the community introduced in the recent years - i might be wrong - so please check with the gate list.

all the best
 borislav

On Mar 2, 2010, at 4:49 PM, Harrill, David C wrote:


To whom it may concern,

In working with the KIM tool, I came across the Document Detail screen which displays both the Features associated with the document as well as the document content. Within the Features section, there exists two Features (KeyEntities and KeyPhrases). Are these two features derived from the GATE application and if so using what GATE plug-in? Otherwise how do these entities and phrases get populated on this screen. I appreciate any information you can provide on this matter and I look forward to hearing from you in regard to this matter.

Thanks,
Dave

_______________________________________________
Kim-discussion mailing list
Kim-discussion@ontotext.com
http://ontotext.com/mailman/listinfo/kim-discussion



_______________________________________________
Kim-discussion mailing list
Kim-discussion@ontotext.com
http://ontotext.com/mailman/listinfo/kim-discussion

Reply via email to