Hi Olek,


On Sat, Mar 14, 2015 at 10:50 PM, Oleksandr Olgashko <
alexandrolg...@gmail.com> wrote:

> So far, I have done warm-up tasks, and now going to dig source code.
> Could you please review my thoughts about 5.19 task?
>
> (i) Is it interesting to have a unifying value for the selected candidate?
> How would you combine the values from the filters that are already in place
> ?
> If I do not miss anything, the route is as follows: 1) create some
> annotated set of entities 2) design combining function for those three
> values, e.g. mean 3) play with function and coefficients, to find best
> suitable.
>

So yes, the current pipeline is:

  1. Get some surface forms
  2. Match those surface forms into Candidate Topics
  3. Get the contexts of the candidate topics
  4. Use a disambiguation function to calculate some scores ( FinalScore,
SecondPercetangeRank)
 5. Filter Topics below the certain thresholds

There are some tools for finding the best set of parameters(confidence,
support..etc) for a given set of annotated data. i.e:
https://github.com/diegoceccarelli/dexter-eval

In our case we have seen some dodgy quality of the vectors used during
disambiguation, which makes it a bit hard regardless of how good is the
function you could design.
There could be other methods of disambiguation which do not rely
necessarily on context vectors or that use them with other information i.e:
Graph Information..


>
> (ii) can the notion of entity relevance be equated with that of confidence
> ?
> In general, no, that depends on how both are calculated. However, in case
> of entity recognition, relevance of guess is derived from the features
> (e.g. if word ends with "-er", that gives several points in favor that we
> are talking about profession) + algorithm for context, so these concepts
> are same.
> One of possible ways (from specific to DBpedia) to increase the precision
> of algorithm is to find "number of transitions in Wikipedia" between words
> in context. Am I thinking in right direction?
>

I agree that confidence != Relevance.
Im not sure what you mean with :
""" "number of transitions in Wikipedia" between words in context. """
Do you mean distance between Topics in the DBpedia Graph ?


>
>

> By the way, if I choose in online demo `Confidence` -> 0, select `n-best`
> and press `Annotate`, what the numbers in dropdown list means? For example,
> for word `First` first two are World War I (1.00) and Football League First
> Division (1.45e-7)
>
> This corresponds to a score named `finalScore` it is based on the context
vectors and a value called `percentageOfSecondrank` which estimates  the
percentage of the finalScore of the next-best entity compared to the
finalScore of the current.

If you hit the candidates endpoint you can get all of these scores. here is
an example:

http://spotlight.sztaki.hu:2222/rest/candidates?confidence=0.0&text=First%20documented%20in%20the%2013th%20century,%20Berlin%20was%20the%20capital%20of%20the%20Kingdom%20of%20Prussia%20(1701%E2%80%931918),%20the%20German%20Empire%20(1871%E2%80%931918),%20the%20Weimar%20Republic%20(1919%E2%80%9333)%20and%20the%20Third%20Reich%20(1933%E2%80%9345).%20Berlin%20in%20the%201920s%20was%20the%20third%20largest%20municipality%20in%20the%20world.%20After%20World%20War%20II,%20the%20city%20became%20divided%20into%20East%20Berlin%20--%20the%20capital%20of%20East%20Germany%20--%20and%20West%20Berlin,%20a%20West%20German%20exclave%20surrounded%20by%20the%20Berlin%20Wall%20from%201961%E2%80%9389.%20Following%20German%20reunification%20in%201990,%20the%20city%20regained%20its%20status%20as%20the%20capital%20of%20Germany,%20hosting%20147%20foreign%20embassies
.


>
> 2015-03-09 14:52 GMT+02:00 Oleksandr Olgashko <alexandrolg...@gmail.com>:
>
>> Found warm-up tasks for DBpedia Spotlight, sorry for inconvenience
>>
>> 2015-03-09 13:06 GMT+02:00 Oleksandr Olgashko <alexandrolg...@gmail.com>:
>>
>>> Thanks for answers,
>>>
>>> On previous project I was working on several named entity recognition
>>> classifiers (naive Bayes and conditional random field based, we used
>>> Ontonotes corpus data), also I have brief experience with Apache Spark.
>>> So, probably, 5.16 and 5.17 would be most suitable for me, and 5.14 is
>>> worth to think about.
>>> Could you please give some warm-up tasks for these ideas?
>>> Also, is it possible to use Stanford NLP (GPL license?)
>>>
>>> 2015-03-09 12:42 GMT+02:00 David Przybilla <dav.alejan...@gmail.com>:
>>>
>>>> Hi Oleksandr,
>>>>
>>>> 5.16, 5.17 both involve Scala + A bit of Natural Language Processing.
>>>> 5.17 is more about being able to massage a wikipedia dump and getting
>>>> numbers out of it for Name entity recognition.
>>>>
>>>>
>>>>
>>>> On Mon, Mar 9, 2015 at 9:27 AM, Dimitris Kontokostas <jimk...@gmail.com
>>>> > wrote:
>>>>
>>>>> Hi Oleksandr & welcome
>>>>>
>>>>> I'd suggest you narrow down your topics to very few 1-2 in order to be
>>>>> able to better focus on your final proposal.
>>>>> Let us know if you have any questions
>>>>>
>>>>> Cheers,
>>>>> DImitris
>>>>>
>>>>> On Sun, Mar 8, 2015 at 11:59 PM, Oleksandr Olgashko <
>>>>> alexandrolg...@gmail.com> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I'd like to investigate possibilities to participate in GSoC as part
>>>>>> of DBpedia organizations. Since I never participated in GSoC before, some
>>>>>> questions may sound naive.
>>>>>>
>>>>>> My name is Oleksandr Olgashko, I'm a first year master's student in
>>>>>> Taras Shevchenko National University of Kyiv (Ukraine). Some links about 
>>>>>> me:
>>>>>> https://github.com/dveim
>>>>>> https://www.linkedin.com/in/olgashko
>>>>>> https://www.coursera.org/user/i/d5878dc26bfe6cbe456d0e119d96e551
>>>>>>
>>>>>> My primary interests are machine learning (particularly, natural
>>>>>> language processing, what I was doing on previous project) and data
>>>>>> analysis, also I'm a fan of Scala programming language. DBpedia has most
>>>>>> natural combination of those skills.
>>>>>>
>>>>>> On your ideas page I've found several interesting projects, like 5.3,
>>>>>> 5.7, 5.14, 5.16, 5.17. Which of them are more relevant, so I can start
>>>>>> research deeper?
>>>>>>
>>>>>> Thanks for answers
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------------
>>>>>> Dive into the World of Parallel Programming The Go Parallel Website,
>>>>>> sponsored
>>>>>> by Intel and developed in partnership with Slashdot Media, is your
>>>>>> hub for all
>>>>>> things parallel software development, from weekly thought leadership
>>>>>> blogs to
>>>>>> news, videos, case studies, tutorials and more. Take a look and join
>>>>>> the
>>>>>> conversation now. http://goparallel.sourceforge.net/
>>>>>> _______________________________________________
>>>>>> Dbpedia-gsoc mailing list
>>>>>> Dbpedia-gsoc@lists.sourceforge.net
>>>>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Kontokostas Dimitris
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Dive into the World of Parallel Programming The Go Parallel Website,
>>>>> sponsored
>>>>> by Intel and developed in partnership with Slashdot Media, is your hub
>>>>> for all
>>>>> things parallel software development, from weekly thought leadership
>>>>> blogs to
>>>>> news, videos, case studies, tutorials and more. Take a look and join
>>>>> the
>>>>> conversation now. http://goparallel.sourceforge.net/
>>>>> _______________________________________________
>>>>> Dbpedia-gsoc mailing list
>>>>> Dbpedia-gsoc@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>>>>>
>>>>>
>>>>
>>>
>>
>
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Dbpedia-gsoc mailing list
Dbpedia-gsoc@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc

Reply via email to