Hi Rüdiger,

The engine just adds the extracted microdata annotations as RDF to the items metadata linked to the content item by the "http://www.semanticdesktop.org/ontologies/2007/01/19/nie#contains"; property. So it should be straightforward to build an index as you described by extracting them from the metadata.

Best regards,

Walter Kasper

Rüdiger Kurz wrote:
Hi Walter,

thanks for the quick reply. Are the extracted entities from the htmlextractor enhancement engine automatically stored into the entity hub?

What I want to reach is to get an index that stores the extracted entities and also the document itself with references on the entities related to this document. It would be great if that could be done by configuration only. Maybe someone could lend me a hand with building the right enhancement chain as a first step.

In my mind is building up a Solr Search UI offering entity based autosuggestion including spellchecker and faceted search.

Thanks again.

Am 16.03.2013 16:29, schrieb Walter Kasper:
Dear Rüdiger,

The htmlextractor enhancement engine provides a microdata extractor that
should work well for schema.org annotations. Just test it with your data.

Best regards,

Walter

Rüdiger Kurz wrote:
Hello Stanbolers,

I want to extract and then store entities from HTML documents that are
using Microdata annotations based on the type hierarchy of schema.org
as Ontology. I appreciate any kind of approach including the use of VIE.

Many thanks in advance
Rüdiger




--
Dr. Walter Kasper
DFKI GmbH
Stuhlsatzenhausweg 3
D-66123 Saarbrücken
Tel.:  +49-681-85775-5300
Fax:   +49-681-85775-5338
Email: [email protected]
-------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern

Geschaeftsfuehrung:
Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff

Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes

Amtsgericht Kaiserslautern, HRB 2313
-------------------------------------------------------------

Reply via email to