Re: How to extract entities from documents using Microdata?

Rupert Westenthaler Sun, 17 Mar 2013 03:22:08 -0700

Hi Walter

On Sat, Mar 16, 2013 at 5:20 PM, Walter Kasper <[email protected]> wrote:
> Hi Rüdiger,
>
> The engine just adds the extracted microdata annotations as RDF to the items
> metadata linked to the content item by the
> "http://www.semanticdesktop.org/ontologies/2007/01/19/nie#contains";
> property. So it should be straightforward to build an index as you described
> by extracting them from the metadata.
>


I think the htmlextractor engine should also create
"fise:EntityAnnotations" for extracted Entities. So that

    {entityAnno} rdf:type fise:EntityAnnotation
    {entityAnno} fise:entity-reference {entity-uri}
    {entityAnno} fise:entity-label {entity-label}
    {entityAnno} fise:entity-type {entity-type}
    {entityAnno} fise:extracted-from {content-item-uti}

If your Engine can also determine the section in the text mentioning
the entity, than it would good if you could also create an
fise:TextAnnotation for it.

This might be important:

* because other Engines in the Enhancement Chain could process
explicitly mentioned Entities similar to extracted one
* the Contenthub will need those information to correctly set the
Context for executing configured LDPaths.

best
Rupert

> Best regards,
>
> Walter Kasper
>
>
> Rüdiger Kurz wrote:
>>
>> Hi Walter,
>>
>> thanks for the quick reply. Are the extracted entities from the
>> htmlextractor enhancement engine automatically stored into the entity hub?
>>
>> What I want to reach is to get an index that stores the extracted entities
>> and also the document itself with references on the entities related to this
>> document. It would be great if that could be done by configuration only.
>> Maybe someone could lend me a hand with building the right enhancement chain
>> as a first step.
>>
>> In my mind is building up a Solr Search UI offering entity based
>> autosuggestion including spellchecker and faceted search.
>>
>> Thanks again.
>>
>> Am 16.03.2013 16:29, schrieb Walter Kasper:
>>>
>>> Dear Rüdiger,
>>>
>>> The htmlextractor enhancement engine provides a microdata extractor that
>>> should work well for schema.org annotations. Just test it with your data.
>>>
>>> Best regards,
>>>
>>> Walter
>>>
>>> Rüdiger Kurz wrote:
>>>>
>>>> Hello Stanbolers,
>>>>
>>>> I want to extract and then store entities from HTML documents that are
>>>> using Microdata annotations based on the type hierarchy of schema.org
>>>> as Ontology. I appreciate any kind of approach including the use of VIE.
>>>>
>>>> Many thanks in advance
>>>> Rüdiger
>>
>>
>
>
>
> --
> Dr. Walter Kasper
> DFKI GmbH
> Stuhlsatzenhausweg 3
> D-66123 Saarbrücken
> Tel.:  +49-681-85775-5300
> Fax:   +49-681-85775-5338
> Email: [email protected]
> -------------------------------------------------------------
> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
> Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
>
> Geschaeftsfuehrung:
> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
> Dr. Walter Olthoff
>
> Vorsitzender des Aufsichtsrats:
> Prof. Dr. h.c. Hans A. Aukes
>
> Amtsgericht Kaiserslautern, HRB 2313
> -------------------------------------------------------------
>



--
| Rupert Westenthaler             [email protected]
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: How to extract entities from documents using Microdata?

Reply via email to