Hi Rüdiger

On Sat, Mar 16, 2013 at 4:56 PM, Rüdiger Kurz <[email protected]> wrote:
> Hi Walter,
>
> thanks for the quick reply. Are the extracted entities from the
> htmlextractor enhancement engine automatically stored into the entity hub?
>

There is no such component that stores Entities present in the
ContentItem to the Entityhub, as this is a very uncommon use case.
Typically entities extracted from parsed content are considered as
suggestions.  So one would consider an additional user interaction
(e.g. accepting & storing an Entity) before adding them to the
Entityhub.

However if this is your use case it should be simple to add such an
Enhancement engine.

> What I want to reach is to get an index that stores the extracted entities
> and also the document itself with references on the entities related to this
> document. It would be great if that could be done by configuration only.

If the htmlextractor engines adds extracted Entities to the metadata
of the ContentItem you can access them with the LDPath configuration
of the Contenthub.

> Maybe someone could lend me a hand with building the right enhancement chain
> as a first step.
>

If you just want the Enhancer to extract Microdata you can create an
chain that only contains the htmlextractor engine

> In my mind is building up a Solr Search UI offering entity based
> autosuggestion including spellchecker and faceted search.

For Entity based autosuggestion you might want to use the Entityhub.
The rest should be possible by using the Contenthub.

>
> Thanks again.
>
> Am 16.03.2013 16:29, schrieb Walter Kasper:
>
>> Dear Rüdiger,
>>
>> The htmlextractor enhancement engine provides a microdata extractor that
>> should work well for schema.org annotations. Just test it with your data.
>>
>> Best regards,
>>
>> Walter
>>
>> Rüdiger Kurz wrote:
>>>
>>> Hello Stanbolers,
>>>
>>> I want to extract and then store entities from HTML documents that are
>>> using Microdata annotations based on the type hierarchy of schema.org
>>> as Ontology. I appreciate any kind of approach including the use of VIE.
>>>
>>> Many thanks in advance
>>> Rüdiger
>
>
> --
> Rüdiger Kurz
>
> -------------------
>
> Alkacon Software GmbH - The OpenCms Experts
> An der Wachsfabrik 13
> 50996 Koeln, DE
>
> http://www.alkacon.com
> http://www.opencms.org
>
> Geschäftsführer: Alexander Kandzior, Amtsgericht Köln, HRB 54613



--
| Rupert Westenthaler             [email protected]
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Reply via email to