Hi Maatari,

On Tue, May 27, 2014 at 2:53 PM, Maatari Daniel Okouya
<okouy...@yahoo.fr> wrote:
> Hi, thanks for your answer.
>
> I mean Topic Annotation.
>

Currently the only available Topic Classification engine in Stanbol is
the one described by [1]. As Stanbol does not ship with pre-trained
models (e.g. for IPTC or similar thesauri) you will need to train your
own models. [1] also provides an introduction how to do that.

This year I am mentor of an GSoC (Google Summer of Code) project that
is about defining a clear Topic Classification API [2] [3] and two
additional implementations of such engines.

> Ultimately what i would like to have is something like: { PDFuri 
> FoaF:PrimaryTopic London  . }   as triple in the return RDF.
>
> But for now, i don’t concern myself with using FOAF.
>

Topic Engines will always use fise:TopicAnnotation to describe
extracted engines. If you just want "{PDF-uri} foaf:primaryTopic
{topic-uri}" you can easily get this by taking the topics referenced
by fise:TopicAnnotation and linking them using foaf:primaryTopic
directly to the ContentIem

> I just want to have the main topics of the PDF. I don’t necessarily want to 
> extract all the entity etc….
>
> SO maybe in term of the annotation generated i would say not having 
> fise:EntityAnnotation neither fise:TextAnnotation but simply 
> fise:TopicAnnotation
>

No problem just configure an Enhancement Chain with the

* tika engine: to extract plain text from the PDFs
* langdetect engine: to detect the language (as alternative you can
also parse the language by setting the Content-Language HTTP header in
requests)
* the topic engine configured with the model you trained.

best
Rupert

[1] http://www.iks-project.eu/sites/default/files/Topic-Classification.pdf
[2] http://furkankamaci.com/gsoc-2014-acceptance-apache-stanbol/
[3] https://issues.apache.org/jira/browse/STANBOL-1294
>
> --
> Maatari Daniel Okouya
> Sent with Airmail
>
> On 27 May 2014 at 13:08:38, Rupert Westenthaler 
> (rupert.westentha...@gmail.com) wrote:
>
> On Tue, May 27, 2014 at 12:49 PM, Maatari Daniel Okouya
> <okouy...@yahoo.fr> wrote:
>> Hi,
>>
>> I have just started to use apache stanbol. I’m still playing around with it 
>> to figure out everything that is out there. However, I’m puzzle by one 
>> thing. I would like to configure it such that upon uploading a text or a Pdf 
>> document, an RDF containing only the topic of the pdf shall be returned.
>>
>
> What do you mean by "topic"? In case of PDF files the Tika Engine [1]
> can extract metadata. Such metadata are directly added to the URI of
> the contentItem and do not use FISE.
>
>> I’m scratching my head but i don’t see how to do so. What is the engine that 
>> is suppose to produce <<Fise:Annotation>>
>>
>
> All Stanbol Engines do generate FISE enhancements
> (fise:TextAnnotation, fise:EntityAnnotation and fise:TopicAnnotation)
>
> When you look at the list of engines [2]
>
> * Language Detection engines create a fise:TextAnnotation describing
> the language of the document (?la dc:type dc:LinguisticSystem; ?la
> dc:language ?lang)
> * Named Entity Recognition (NER) Engines create fise:TextAnnotations
> for Entities recognized by the NLP framework.
> * Linking / Suggestions create fise:EntityAnnotation for Entities
> found in the text. They might also add fise:TextAnnotation to mark the
> exact mention of such entities in the text.
> * Topic Classification engines use fise:TopicAnnotation to describe
> assigned topics. They also use a fise:TextAnnotation to mark the part
> of the text the topic is assigned to
>
>> as described in 
>> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html
>
> Yep this page describes the annotations as created by the EnhancementEngines.
>
>
> Without knowing what you mean by " ... only the topic of the pdf ..."
> I can not recommend you suitable Stanbol configurations.
>
> best
> Rupert
>
>>
>>
>
>
> [1] 
> http://stanbol.apache.org/docs/trunk/components/enhancer/engines/tikaengine
> [2] http://stanbol.apache.org/docs/trunk/components/enhancer/engines/list
>
>> I would appreciate if someone could provide me with some pointers.
>>
>> Many thanks,
>>
>> Maatary
>>
>> --
>> Maatari Daniel Okouya
>> Sent with Airmail
>
>
>
> --
> | Rupert Westenthaler rupert.westentha...@gmail.com
> | Bodenlehenstraße 11 ++43-699-11108907
> | A-5500 Bischofshofen
> | REDLINK.CO 
> ..........................................................................
> | http://redlink.co/



-- 
| Rupert Westenthaler             rupert.westentha...@gmail.com
| Bodenlehenstraße 11                              ++43-699-11108907
| A-5500 Bischofshofen
| REDLINK.CO 
..........................................................................
| http://redlink.co/

Reply via email to