Many thanks, 

Got it. 


Best, 

-M-
-- 
Maatari Daniel Okouya
Sent with Airmail

On 28 May 2014 at 06:36:52, Rupert Westenthaler (rupert.westentha...@gmail.com) 
wrote:

Hi Maatari,  

On Tue, May 27, 2014 at 2:53 PM, Maatari Daniel Okouya  
<okouy...@yahoo.fr> wrote:  
> Hi, thanks for your answer.  
>  
> I mean Topic Annotation.  
>  

Currently the only available Topic Classification engine in Stanbol is  
the one described by [1]. As Stanbol does not ship with pre-trained  
models (e.g. for IPTC or similar thesauri) you will need to train your  
own models. [1] also provides an introduction how to do that.  

This year I am mentor of an GSoC (Google Summer of Code) project that  
is about defining a clear Topic Classification API [2] [3] and two  
additional implementations of such engines.  

> Ultimately what i would like to have is something like: { PDFuri 
> FoaF:PrimaryTopic London . } as triple in the return RDF.  
>  
> But for now, i don’t concern myself with using FOAF.  
>  

Topic Engines will always use fise:TopicAnnotation to describe  
extracted engines. If you just want "{PDF-uri} foaf:primaryTopic  
{topic-uri}" you can easily get this by taking the topics referenced  
by fise:TopicAnnotation and linking them using foaf:primaryTopic  
directly to the ContentIem  

> I just want to have the main topics of the PDF. I don’t necessarily want to 
> extract all the entity etc….  
>  
> SO maybe in term of the annotation generated i would say not having 
> fise:EntityAnnotation neither fise:TextAnnotation but simply 
> fise:TopicAnnotation  
>  

No problem just configure an Enhancement Chain with the  

* tika engine: to extract plain text from the PDFs  
* langdetect engine: to detect the language (as alternative you can  
also parse the language by setting the Content-Language HTTP header in  
requests)  
* the topic engine configured with the model you trained.  

best  
Rupert  

[1] http://www.iks-project.eu/sites/default/files/Topic-Classification.pdf  
[2] http://furkankamaci.com/gsoc-2014-acceptance-apache-stanbol/  
[3] https://issues.apache.org/jira/browse/STANBOL-1294  
>  
> --  
> Maatari Daniel Okouya  
> Sent with Airmail  
>  
> On 27 May 2014 at 13:08:38, Rupert Westenthaler 
> (rupert.westentha...@gmail.com) wrote:  
>  
> On Tue, May 27, 2014 at 12:49 PM, Maatari Daniel Okouya  
> <okouy...@yahoo.fr> wrote:  
>> Hi,  
>>  
>> I have just started to use apache stanbol. I’m still playing around with it 
>> to figure out everything that is out there. However, I’m puzzle by one 
>> thing. I would like to configure it such that upon uploading a text or a Pdf 
>> document, an RDF containing only the topic of the pdf shall be returned.  
>>  
>  
> What do you mean by "topic"? In case of PDF files the Tika Engine [1]  
> can extract metadata. Such metadata are directly added to the URI of  
> the contentItem and do not use FISE.  
>  
>> I’m scratching my head but i don’t see how to do so. What is the engine that 
>> is suppose to produce <<Fise:Annotation>>  
>>  
>  
> All Stanbol Engines do generate FISE enhancements  
> (fise:TextAnnotation, fise:EntityAnnotation and fise:TopicAnnotation)  
>  
> When you look at the list of engines [2]  
>  
> * Language Detection engines create a fise:TextAnnotation describing  
> the language of the document (?la dc:type dc:LinguisticSystem; ?la  
> dc:language ?lang)  
> * Named Entity Recognition (NER) Engines create fise:TextAnnotations  
> for Entities recognized by the NLP framework.  
> * Linking / Suggestions create fise:EntityAnnotation for Entities  
> found in the text. They might also add fise:TextAnnotation to mark the  
> exact mention of such entities in the text.  
> * Topic Classification engines use fise:TopicAnnotation to describe  
> assigned topics. They also use a fise:TextAnnotation to mark the part  
> of the text the topic is assigned to  
>  
>> as described in 
>> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html
>>   
>  
> Yep this page describes the annotations as created by the EnhancementEngines. 
>  
>  
>  
> Without knowing what you mean by " ... only the topic of the pdf ..."  
> I can not recommend you suitable Stanbol configurations.  
>  
> best  
> Rupert  
>  
>>  
>>  
>  
>  
> [1] 
> http://stanbol.apache.org/docs/trunk/components/enhancer/engines/tikaengine  
> [2] http://stanbol.apache.org/docs/trunk/components/enhancer/engines/list  
>  
>> I would appreciate if someone could provide me with some pointers.  
>>  
>> Many thanks,  
>>  
>> Maatary  
>>  
>> --  
>> Maatari Daniel Okouya  
>> Sent with Airmail  
>  
>  
>  
> --  
> | Rupert Westenthaler rupert.westentha...@gmail.com  
> | Bodenlehenstraße 11 ++43-699-11108907  
> | A-5500 Bischofshofen  
> | REDLINK.CO 
> ..........................................................................  
> | http://redlink.co/  



--  
| Rupert Westenthaler rupert.westentha...@gmail.com  
| Bodenlehenstraße 11 ++43-699-11108907  
| A-5500 Bischofshofen  
| REDLINK.CO 
..........................................................................  
| http://redlink.co/  

Reply via email to