Hi, I just came along STANBOL-660 which requires to send the language of the content as an additional information to Stanbol.
We may need to define a standard way how to pass existing metadata to Stanbol. [1] https://issues.apache.org/jira/browse/STANBOL-660 2012/12/11 Fabian Christ <[email protected]> > Hi, > > most engines in Stanbol can only handle plain text. To support other > formats we use the Tika engine which converts binary formats like PDF into > plain text. > > I do not know what happens with HTML content right now in the Tika engine. > > We had discussions in the past that Stanbol should support to receive RDFa > annotated HTML, strip of the HTML tags, enhance the text, re-add the HTML > tags and add the new enhancements as RDFa by preserving the existing RDFa. > Maybe the existing RDFa could also be used as an important input for some > engines. It is the case where already some metadata exist that could be > used by Stanbol. But such a cool feature would require a new engine. > > Best, > - Fabian > > > 2012/12/11 David Riccitelli <[email protected]> > >> Hello, >> >> Does Stanbol currently support the analysis of the content of a URL? >> >> If yes, how does this work according to the different content types, e.g.: >> 1. for text/plain does it fetch and analyse the whole text? >> 2. for text/html does it fetch and analyse only the TITLE and the BODY >> (stripped of the HTML tags)? >> 3. are other content types supported? >> >> Thanks, >> David >> >> -- >> David Riccitelli >> >> >> ******************************************************************************** >> InsideOut10 s.r.l. >> P.IVA: IT-11381771002 >> Fax: +39 0110708239 >> --- >> LinkedIn: http://it.linkedin.com/in/riccitelli >> Twitter: ziodave >> --- >> Layar Partner Network< >> http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1 >> > >> >> ******************************************************************************** >> > > > > -- > Fabian > http://twitter.com/fctwitt > -- Fabian http://twitter.com/fctwitt
