Dear all,
Lately, as Apache Stanbol integrators, we have been widely working in
Zaizi with Enhancement Engines that allows not only to link entities
with Knowledge Bases (mainly DBpedia) but also to disambiguate them. As
you know, currently, there are two engines in Stanbol that can be used
for disambiguation purposes: disambiguation-mlt [1], developed by
Kritarth Anand as part of a GSOC project supervised by Rupert, and
DBpedia Spotlight [2], contributed by Pablo Mendes and Iavor Jelev as
part of the Early Adopters programme [3] and currently in the trunk
integrated within a Enhancement Chain called dbpedia-spotlight.
Currently, while dbpedia-spotlight Enhancement Chain can be used
"out-of-the-box" even with local installations of DBpedia Spotlight,
it's difficult to configure and get running disambiguation-mlt engine.
Also, after spend some days of testing with this engine, we found that
the results weren't very good. So, after a couple of discussions with
Rupert and also with the feedback of one of our customers, we concluded
that it would be necessary to go far with disambiguation engines in
Stanbol and we decided to start working in a complete new Disambiguation
Framework that would allow also to perform disambiguation with custom
vocabularies and knowledge bases.
We wanted to propose in the list a first approach to a roadmap for
disambiguation in Stanbol. In our opinion, a high-level list of tasks
that should be done is the following:
- Agree a disambiguation index model to store entities' surface forms
and disambiguation contexts independent of the Knowledge Base, enabling
also disambiguation with custom vocabularies.
- Design and develop tools for building such indexes, including an
specific one for DBpedia - Wikipedia.
- Maintain disambiguation-mlt as a baseline disambiguation algorithm and
adapt it to work with the new designed index. Adapt it to work with last
Enhancer Release and merge it with the trunk in SVN.
- Design and develop new disambiguation algorithms based on entities
co-occurrence, graph representations and statistical models.
Any comment, feedback or ideas are more than welcome!!!
Regards
[1] - https://github.com/kritarthanand/Disambiguation-Stanbol
[2] - https://github.com/dbpedia-spotlight/dbpedia-spotlight
[3] -
http://blog.iks-project.eu/dbpedia-spotlight-integration-in-apache-stanbol-2/
--
------------------------------
This message should be regarded as confidential. If you have received this
email in error please notify the sender and destroy it immediately.
Statements of intent shall only become binding when confirmed in hard copy
by an authorised signatory.
Zaizi Ltd is registered in England and Wales with the registration number
6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam Road,
London W10 5JJ, UK.