Hi Dileepa,
El 23/04/13 13:45, Dileepa Jayakody escribió:
Hi Fabian et al,
Thanks a lot for your valuable ideas.
Yes it's really interesting to implement a 'person | organization'
disambiguation module using WebID protocol as part of Stanbol Enhancement
Engine. I went through the documentation of Stanbol and I have gained an
overall idea about the architecture of Stanbol.
+1. That's a great idea. I don't know very much about WebID protocol,
but as far as you could use some profile data as disambiguation
contexts, it should be feasible to implement a disambiguation algorithm.
Could you please give us a concrete example where WebID is used? I
suppose that the general use case is to link name mentions in web pages
with digital identities. What kind of information is it possible to
gather from WebID identities?
It would be great to get more ideas, suggestions about how to use Stanbol
for people, organization disambiguation and to discuss the objectives and
milestones in the GSOC project idea at [1].
I also think one of the main factor for disambiguation is the
data-set/knowledge base used for the process. What is the data-set Stanbol
uses to verify data? Is Google Wiki-links released recently [1] a candidate
for Stanbol data-set?
Initially, you can use any knowledge base in Stanbol. I always identify
EntityHub component as a "Knowledge Base" management system, although
maybe formally the EntityHub is not exactly that. Anyway, Google
Wiki-links could be a good resource for disambiguation when the
knowledge base is Wikipedia or DBpedia. In fact, Wiki-links contains 40
millions of mentions and its contexts retrieved from web pages. This
information can be eventually added to a Wikipedia or DBpedia knowledge
base as disambiguation contexts for the entities covered in the dataset.
Another interesting resource, as the new in techcrunch points, is the
dictionary of Wikipedia concepts released last year [1]. This resource
can be used to include more labels for each entity (possible names),
improving then the candidate selection step. As always, we face a
recall/precision problem with such dictionary.
[1] -
http://googleresearch.blogspot.com.es/2012/05/from-words-to-concepts-and-back.html
Regards!
Thanks,
Dileepa
[1]
http://techcrunch.com/2013/03/08/google-research-releases-wikilinks-corpus-with-40m-mentions-and-3m-entities/
On Mon, Apr 22, 2013 at 7:32 PM, Fabian Christ <[email protected]
wrote:
Hi,
2013/4/22 Dileepa Jayakody <[email protected]>:
Could it be a valid use-case to integrate WebID protocol in Stanbol to
create social graphs and related ontologies?
the already mentioned entity disambiguation for persons might be such
a use case.
Another idea could be that the enhancement process uses some
information from the personal profile of the user who sends the
request. I do not have any concrete example at the moment but engines
might be interested in knowing who is sending an enhancement request.
This may also be a relevant information for the disambiguation task.
Best,
- Fabian
--
Fabian
http://twitter.com/fctwitt
--
------------------------------
This message should be regarded as confidential. If you have received this
email in error please notify the sender and destroy it immediately.
Statements of intent shall only become binding when confirmed in hard copy
by an authorised signatory.
Zaizi Ltd is registered in England and Wales with the registration number
6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam Road,
London W10 5JJ, UK.