Dear all,

Lately, as Apache Stanbol integrators, we have been widely working in Zaizi with Enhancement Engines that allows not only to link entities with Knowledge Bases (mainly DBpedia) but also to disambiguate them. As you know, currently, there are two engines in Stanbol that can be used for disambiguation purposes: disambiguation-mlt [1], developed by Kritarth Anand as part of a GSOC project supervised by Rupert, and DBpedia Spotlight [2], contributed by Pablo Mendes and Iavor Jelev as part of the Early Adopters programme [3] and currently in the trunk integrated within a Enhancement Chain called dbpedia-spotlight.

Currently, while dbpedia-spotlight Enhancement Chain can be used "out-of-the-box" even with local installations of DBpedia Spotlight, it's difficult to configure and get running disambiguation-mlt engine. Also, after spend some days of testing with this engine, we found that the results weren't very good. So, after a couple of discussions with Rupert and also with the feedback of one of our customers, we concluded that it would be necessary to go far with disambiguation engines in Stanbol and we decided to start working in a complete new Disambiguation Framework that would allow also to perform disambiguation with custom vocabularies and knowledge bases.

We wanted to propose in the list a first approach to a roadmap for disambiguation in Stanbol. In our opinion, a high-level list of tasks that should be done is the following:

- Agree a disambiguation index model to store entities' surface forms and disambiguation contexts independent of the Knowledge Base, enabling also disambiguation with custom vocabularies.

- Design and develop tools for building such indexes, including an specific one for DBpedia - Wikipedia.

- Maintain disambiguation-mlt as a baseline disambiguation algorithm and adapt it to work with the new designed index. Adapt it to work with last Enhancer Release and merge it with the trunk in SVN.

- Design and develop new disambiguation algorithms based on entities co-occurrence, graph representations and statistical models.


Any comment, feedback or ideas are more than welcome!!!

Regards

[1] - https://github.com/kritarthanand/Disambiguation-Stanbol
[2] - https://github.com/dbpedia-spotlight/dbpedia-spotlight
[3] - http://blog.iks-project.eu/dbpedia-spotlight-integration-in-apache-stanbol-2/

--

------------------------------
This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately. Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number 6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam Road, London W10 5JJ, UK.

Reply via email to