Hi everybody I'm Antonio David Pérez, a new Zaizi team member and a student for a MSc at the University of Seville. Lastly, I've been involved in the development of a semantic CMS solution in a Spanish Company called Ximdex working with several technologies like Apache Nutch, Apache Solr and also Apache Stanbol.
Currently, I've been assigned to a project that involves different technologies like Apache Stanbol and Apache ManifoldCF. So, related to Stanbol, I'm interested in the disambiguation problem, so I would like to prepare a proposal for GSoC about this topic. I have been following last mails about disambiguation and WebID protocol. I would be more interesting in develop disambiguation systems within Stanbol using the major semantic knowledge bases. Actually, my initial idea is to use Freebase with the aim to make it extensible to any other database like Wikipedia and DBpedia. Following STANBOL-1037 [1], the main goal is to implement a couple of global-approach disambiguation algorithms to be used in Stanbol. For this, I would like to discuss some topics about the proposal: - Knowledge Base: I have decided to stick first to Freebase, because it has a REST API allowing 100k calls per day for read and 10k for write. Besides the REST API, an alternative could be to integrate the whole freebase graph in Stanbol and use their Java API to manage it. Ideally, the management framework should be valid for others knowledge bases as Wikipedia or DBpedia. - Resources: As have been pointed before in the mailing lists, google has released a couple of resources to be used in disambiguation applications. One if a dictionary of concepts from Wikipedia, using anchor text labels in Wikipedia internal links to create an index of entities possible names [2]. The second one is a dataset of texts that links to concepts in the Wikipedia [3] that can be used as disambiguation contexts according to STANBOL-1037. I need to research if similar information can be retrieved directly from freebase or , in other words, to check if this information is already incorporated in Freebase. Moreover, the proposal design will try to be as generic as possible in order to be adaptable to any other Knowledge Base. Waiting for your comments and valuable suggestions. Thanks Regards [1] https://issues.apache.org/jira/browse/STANBOL-1037 [2] http://googleresearch.blogspot.com.es/2012/05/from-words-to-concepts-and-back.html [3] https://code.google.com/p/wiki-links/ -- ------------------------------ This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately. Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory. Zaizi Ltd is registered in England and Wales with the registration number 6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam Road, London W10 5JJ, UK.