Hi, With the Multipart ContentItem API it is already possible to parse existing metadata to the Stanbol Enhancer. There is even an example on how to achieve this available as part of the documentation of the RESTful API of the Enahncer [1]
STANBOL-660 is very specific as it only asks to support parsing the language by setting the "Content-Language" header of an enhancement request. So one option to implement that would be to (1) forward all/selected header fields as EnhancementProperties (2) implement an EnhancementEngine that takes the "Content-Language" header and converts it to an fise:TextAnnotation defining the language of the Content (as defined by STANBOL-613). @David: Parsing ContentReferences to the Stanbol Enhancer is supported in the Java API of the Stanbol Enhancer by using ContentReferences [2] to construct a ContentItem. AFAIK it is not possible to use this feature via the RESTful API of the Enhancer. However it is used by the Contenthub. If you have a strong use case for parsing URLs instead of content to the Enhancer please feel free to open an JIRA issue regarding that. best Rupert [1] http://stanbol.apache.org/docs/trunk/components/enhancer/enhancerrest.html#example-4-parse-existing-free-text-annotations [2] http://stanbol.apache.org/docs/trunk/components/enhancer/contentitemfactory.html#ContentReference On Tue, Dec 11, 2012 at 1:40 PM, Fabian Christ <[email protected]> wrote: > Hi, > > I just came along STANBOL-660 which requires to send the language of the > content as an additional information to Stanbol. > > We may need to define a standard way how to pass existing metadata to > Stanbol. > > [1] https://issues.apache.org/jira/browse/STANBOL-660 > > > 2012/12/11 Fabian Christ <[email protected]> > >> Hi, >> >> most engines in Stanbol can only handle plain text. To support other >> formats we use the Tika engine which converts binary formats like PDF into >> plain text. >> >> I do not know what happens with HTML content right now in the Tika engine. >> >> We had discussions in the past that Stanbol should support to receive RDFa >> annotated HTML, strip of the HTML tags, enhance the text, re-add the HTML >> tags and add the new enhancements as RDFa by preserving the existing RDFa. >> Maybe the existing RDFa could also be used as an important input for some >> engines. It is the case where already some metadata exist that could be >> used by Stanbol. But such a cool feature would require a new engine. >> >> Best, >> - Fabian >> >> >> 2012/12/11 David Riccitelli <[email protected]> >> >>> Hello, >>> >>> Does Stanbol currently support the analysis of the content of a URL? >>> >>> If yes, how does this work according to the different content types, e.g.: >>> 1. for text/plain does it fetch and analyse the whole text? >>> 2. for text/html does it fetch and analyse only the TITLE and the BODY >>> (stripped of the HTML tags)? >>> 3. are other content types supported? >>> >>> Thanks, >>> David >>> >>> -- >>> David Riccitelli >>> >>> >>> ******************************************************************************** >>> InsideOut10 s.r.l. >>> P.IVA: IT-11381771002 >>> Fax: +39 0110708239 >>> --- >>> LinkedIn: http://it.linkedin.com/in/riccitelli >>> Twitter: ziodave >>> --- >>> Layar Partner Network< >>> http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1 >>> > >>> >>> ******************************************************************************** >>> >> >> >> >> -- >> Fabian >> http://twitter.com/fctwitt >> > > > > -- > Fabian > http://twitter.com/fctwitt -- | Rupert Westenthaler [email protected] | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen
