Hi Rüdiger,
The engine just adds the extracted microdata annotations as RDF to the
items metadata linked to the content item by the
"http://www.semanticdesktop.org/ontologies/2007/01/19/nie#contains"
property. So it should be straightforward to build an index as you
described by extracting them from the metadata.
Best regards,
Walter Kasper
Rüdiger Kurz wrote:
Hi Walter,
thanks for the quick reply. Are the extracted entities from the
htmlextractor enhancement engine automatically stored into the entity
hub?
What I want to reach is to get an index that stores the extracted
entities and also the document itself with references on the entities
related to this document. It would be great if that could be done by
configuration only. Maybe someone could lend me a hand with building
the right enhancement chain as a first step.
In my mind is building up a Solr Search UI offering entity based
autosuggestion including spellchecker and faceted search.
Thanks again.
Am 16.03.2013 16:29, schrieb Walter Kasper:
Dear Rüdiger,
The htmlextractor enhancement engine provides a microdata extractor that
should work well for schema.org annotations. Just test it with your
data.
Best regards,
Walter
Rüdiger Kurz wrote:
Hello Stanbolers,
I want to extract and then store entities from HTML documents that are
using Microdata annotations based on the type hierarchy of schema.org
as Ontology. I appreciate any kind of approach including the use of
VIE.
Many thanks in advance
Rüdiger
--
Dr. Walter Kasper
DFKI GmbH
Stuhlsatzenhausweg 3
D-66123 Saarbrücken
Tel.: +49-681-85775-5300
Fax: +49-681-85775-5338
Email: [email protected]
-------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
Geschaeftsfuehrung:
Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
-------------------------------------------------------------