Hi,

I just came along STANBOL-660 which requires to send the language of the
content as an additional information to Stanbol.

We may need to define a standard way how to pass existing metadata to
Stanbol.

[1] https://issues.apache.org/jira/browse/STANBOL-660


2012/12/11 Fabian Christ <[email protected]>

> Hi,
>
> most engines in Stanbol can only handle plain text. To support other
> formats we use the Tika engine which converts binary formats like PDF into
> plain text.
>
> I do not know what happens with HTML content right now in the Tika engine.
>
> We had discussions in the past that Stanbol should support to receive RDFa
> annotated HTML, strip of the HTML tags, enhance the text, re-add the HTML
> tags and add the new enhancements as RDFa by preserving the existing RDFa.
> Maybe the existing RDFa could also be used as an important input for some
> engines. It is the case where already some metadata exist that could be
> used by Stanbol. But such a cool feature would require a new engine.
>
> Best,
>  - Fabian
>
>
> 2012/12/11 David Riccitelli <[email protected]>
>
>> Hello,
>>
>> Does Stanbol currently support the analysis of the content of a URL?
>>
>> If yes, how does this work according to the different content types, e.g.:
>>  1. for text/plain does it fetch and analyse the whole text?
>>  2. for text/html does it fetch and analyse only the TITLE and the BODY
>> (stripped of the HTML tags)?
>>  3. are other content types supported?
>>
>> Thanks,
>> David
>>
>> --
>> David Riccitelli
>>
>>
>> ********************************************************************************
>> InsideOut10 s.r.l.
>> P.IVA: IT-11381771002
>> Fax: +39 0110708239
>> ---
>> LinkedIn: http://it.linkedin.com/in/riccitelli
>> Twitter: ziodave
>> ---
>> Layar Partner Network<
>> http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1
>> >
>>
>> ********************************************************************************
>>
>
>
>
> --
> Fabian
> http://twitter.com/fctwitt
>



-- 
Fabian
http://twitter.com/fctwitt

Reply via email to