Hi (again)
> (2) dbp-ont:surfaceForm
>
> I recommended to you to copy labels of redirected pages to the
> "dbp-ont:surfaceForm" field. In the meantime I made some tests with an
> index build like that. The results where really bad because of that I
> must revoke this recommendation!
>
> The reason for that is that the scoring algorithm of Solr is affected
> by the multi-valued "dbp-ont:surfaceForm" field. e.g. for
> dbpedia:Paris you have ~35 "dbp-ont:surfaceForm" values where only
> about ~15 contain "Paris". So if you now make a query for Paris in
> this field
>
> (((@en/dbp\-ont\:surfaceForm/:"paris")))
>
> you will notice that dbpedia:Paris is not within the top 10 search
> results. Instead Entities like "Paris Barclay" are listed because they
> do have only a single value for "dbp-ont:surfaceForm" and therefore
> the match for "Paris" is much more relevant.
Just talked about this problem with Sebastian Schaffert. He suggested
to try setting
omitNorms="true"
for all fields used for labels within the Entityhub. This should have
the affect that Entities with a lot of "dbp-ont:surfaceForm" values
are no longer penalized by the Solr ranking algorithm. So testing that
will require some time.
best
Rupert
>
> This means that the current index-layout where URIs of redirected
> pages are represented as own Entities within the index is much better
> suited for entity extraction.
--
| Rupert Westenthaler [email protected]
| Bodenlehenstraße 11 ++43-699-11108907
| A-5500 Bischofshofen