Hi Rüdiger,
RDFa extraction from HTML is part of the htmlextractor engine in
Stanbol. Iwould welcome it if you could test it with yourOpenCms docs.
Best regards,
Walter
Rüdiger Kurz wrote:
Hi Staboler,
during ApacheCon in Sinsheim I had some interesting conversations with
Fabian, Rupert and Anil as result I want to summarize one of the
discussions as an user story telling a typical requirement for us as
CMS provider.
Talking about traditional Content Management Systems and assuming that
they don't store semantic informations is not correct. For example CMS
Systems already deliver RDFa annotated HTML, nearly all systems are
providing some tagging/categorizing mechanism. Specially OpenCms
provides a generic approach to define a structured content and
therefore we have the information that a specific field/item of a
content has a specified type and a defined label. E.g. A technology
event named ApacheCon takes place in Sinsheim from 05. Nov until 08.
Nov 2012 is the information that is already stored in OpenCms. More
over OpenCms is able to connect that event with all speakers/persons
that will make a presentation on that event, ...
What we would like to achieve is not only a plain text enhancement
more over we are interested in telling Stanbol all informations and
associations we already know. In other words we absolutely don't want
to lose the semantic information that is already existent in OpenCms.
A good starting point would be a REST endpoint providing the ability
to retrieve a RDFa annotated HTML document and than extracts the RDFa
in order to store those inside the semantic-index/entity-hub/... as I
previously suggested on the list under the subject "Extend stanbol
content hub for RDFa support". Maybe the content hub is not the right
component, but the requirement of RDFa extraction is still existent.
--
Dr. Walter Kasper
DFKI GmbH
Stuhlsatzenhausweg 3
D-66123 Saarbrücken
Tel.: +49-681-85775-5300
Fax: +49-681-85775-5338
Email: [email protected]
-------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
Geschaeftsfuehrung:
Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
-------------------------------------------------------------