Hi Rochbenritter,

The reason why Samsung is not found is because the Ontology defines
the labels as xsd:string

Here the excerpt of the ontology:

<gr:BusinessEntity rdf:ID="Samsung">
    <gr:legalName
rdf:datatype="http://www.w3.org/2001/XMLSchema#string"; >Samsung
Group</gr:legalName>
    <rdfs:seeAlso rdf:resource="http://www.samsung.com/"/>
    <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string";
>Samsung</rdfs:label>
    <rdfs:comment xml:lang="en">The business entity Samsung
Group.</rdfs:comment>
    <belongsToModule xml:lang="en">MP3Player, TV, Printer,
DigitalCamera, Camcorder</belongsToModule>
</gr:BusinessEntity>

Expected would be something like

    <rdfs:label xml:lang="en">Samsung</rdfs:label>

or simple

    <rdfs:label>Samsung</rdfs:label>

Because those labels are defined as xsd:String they are indexed by the
indexing tool like

    <arr name="str/gr:legalName/"><str>Samsung Group</str></arr>
    <arr name="str/rdfs:label/"><str>Samsung</str></arr>

compared to natural language labels field that do start with a '@'.
Here the example for the rdfs:comment field

    <arr name="@en/rdfs:comment/"><str>The business entity Samsung
Group.</str></arr>

This is also the reason why those Entities are missing in the
EntityLinking results. xsd:string values are currently not considered
by the Entity Linking Engines.

IMO EntityLinking should consider also xsd:String values. So I
consider this clearly as an Issue of the Stanbol Entity Linking
Engines. I will analyze the implementations of both the Entityhub
Linking Engine and the Lucene FST Linking engine and see how to solve
this issue.

As a workaround I see two possible solutions:

(a) remove all "rdf:datatype="http://www.w3.org/2001/XMLSchema#string";
mentions from the ceo.owl file
(b) apply the following mappings to the "./indexing/config/mappings.txt" file

rdfs:label | d=entityhub:text
gr:legalName | d=entityhub:text

NOTE: there is already a line "rdfs:label". You should replace this
with "rdfs:label | d=entityhub:text"

For you understanding "{field} |  d=entityhub:text" tells the indexing
tool to convert values of that field to the natural language text
datatype. Doing so will result in an SolrIndex that contains both the
xsd:String and the text version.

<arr name="str/rdfs:label/"><str>Samsung</str></arr>
<arr name="@/rdfs:label/"><str>Samsung</str></arr>

Thanks for your report. Before that I was completely unaware that
xsd:String values where not considered by the EntityLinking engine.
best
Rupert

On Tue, Sep 30, 2014 at 4:33 PM, Rochbenritter . <mowenw...@gmail.com> wrote:
> Hi All,
>
> We followed the instructions from
> https://stanbol.apache.org/docs/trunk/customvocabulary.html to create a new
> site for the CEO ontology (
> http://www.ebusiness-unibw.org/ontologies/consumerelectronics/v1).
>
> We managed to process the Ontology and to upload it as a site to the Apache
> Stanbol server. But it is not working totally correct. When an entry from
> the CEO-ontology refers to a rdf:type defined in the
> goodRelations-ontology, then the individual entry can?t be found. We assume
> additional mapping is needed. (e.g. ?LCD? and ?Ambilight 1? found but
> ?Philips? or ?Samsung? not).
>
> You can see our configuration in this Google-Drive folder
> https://drive.google.com/folderview?id=0B59O-GwTGmjjWVNPUGhYeGJmVXM&usp=drive_web
>
> How can we fix this problem?
>
> Cheers,



-- 
| Rupert Westenthaler             rupert.westentha...@gmail.com
| Bodenlehenstraße 11                              ++43-699-11108907
| A-5500 Bischofshofen
| REDLINK.CO 
..........................................................................
| http://redlink.co/

Reply via email to