Hi,

We're using a self-hosted instance of statistical backend, without
context.mem (so to be able to run it with 4GB of RAM). And even without
context.mem spotlight does an amazing job in resource disambiguation.

There's a tiny problem though. With default spotter_thresholds values (1.0
0.2 -0.2 0.25), spotlight never annotates "Google" but annotates all Google
products (e.g., Google Drive), and "Google Inc.". Reading spotlight's
documents I came across this wiki page:
https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/ISemantics-2013-Supporting-Dataand
the Challenges section.

The part that made me thinking it might be relevant is:

    "After further investigation, we found that these were all cases where
the surface form is a substring of another surface form."

Could it be the reason?

It's worth mentioning that increasing the last parameter in
spotter_thresholds values (0.25) to 0.3 will annotate Google, but it does
annotate some other words as well which are not desirable (e.g., it'll
always annotate II as World_War_II).

Another thing that I noticed is that even Lucene backend doesn't annotate
Google unless the spotter is "CoOccurrenceBasedSelector".

Cheers,

amir
------------------------------------------------------------------------------
Android apps run on BlackBerry 10
Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
Now with support for Jelly Bean, Bluetooth, Mapview and more.
Get your Android app in front of a whole new audience.  Start now.
http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk
_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users

Reply via email to