Hi, thanks for your answer. I mean Topic Annotation.
Ultimately what i would like to have is something like: { PDFuri FoaF:PrimaryTopic London . } as triple in the return RDF. But for now, i don’t concern myself with using FOAF. I just want to have the main topics of the PDF. I don’t necessarily want to extract all the entity etc…. SO maybe in term of the annotation generated i would say not having fise:EntityAnnotation neither fise:TextAnnotation but simply fise:TopicAnnotation Many thanks -- Maatari Daniel Okouya Sent with Airmail On 27 May 2014 at 13:08:38, Rupert Westenthaler (rupert.westentha...@gmail.com) wrote: On Tue, May 27, 2014 at 12:49 PM, Maatari Daniel Okouya <okouy...@yahoo.fr> wrote: > Hi, > > I have just started to use apache stanbol. I’m still playing around with it > to figure out everything that is out there. However, I’m puzzle by one thing. > I would like to configure it such that upon uploading a text or a Pdf > document, an RDF containing only the topic of the pdf shall be returned. > What do you mean by "topic"? In case of PDF files the Tika Engine [1] can extract metadata. Such metadata are directly added to the URI of the contentItem and do not use FISE. > I’m scratching my head but i don’t see how to do so. What is the engine that > is suppose to produce <<Fise:Annotation>> > All Stanbol Engines do generate FISE enhancements (fise:TextAnnotation, fise:EntityAnnotation and fise:TopicAnnotation) When you look at the list of engines [2] * Language Detection engines create a fise:TextAnnotation describing the language of the document (?la dc:type dc:LinguisticSystem; ?la dc:language ?lang) * Named Entity Recognition (NER) Engines create fise:TextAnnotations for Entities recognized by the NLP framework. * Linking / Suggestions create fise:EntityAnnotation for Entities found in the text. They might also add fise:TextAnnotation to mark the exact mention of such entities in the text. * Topic Classification engines use fise:TopicAnnotation to describe assigned topics. They also use a fise:TextAnnotation to mark the part of the text the topic is assigned to > as described in > http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html > Yep this page describes the annotations as created by the EnhancementEngines. Without knowing what you mean by " ... only the topic of the pdf ..." I can not recommend you suitable Stanbol configurations. best Rupert > > [1] http://stanbol.apache.org/docs/trunk/components/enhancer/engines/tikaengine [2] http://stanbol.apache.org/docs/trunk/components/enhancer/engines/list > I would appreciate if someone could provide me with some pointers. > > Many thanks, > > Maatary > > -- > Maatari Daniel Okouya > Sent with Airmail -- | Rupert Westenthaler rupert.westentha...@gmail.com | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen | REDLINK.CO .......................................................................... | http://redlink.co/