So far, I have done warm-up tasks, and now going to dig source code.
Could you please review my thoughts about 5.19 task?
(i) Is it interesting to have a unifying value for the selected candidate?
How would you combine the values from the filters that are already in place
?
If I do not miss anything, the route is as follows: 1) create some
annotated set of entities 2) design combining function for those three
values, e.g. mean 3) play with function and coefficients, to find best
suitable.
(ii) can the notion of entity relevance be equated with that of confidence ?
In general, no, that depends on how both are calculated. However, in case
of entity recognition, relevance of guess is derived from the features
(e.g. if word ends with "-er", that gives several points in favor that we
are talking about profession) + algorithm for context, so these concepts
are same.
One of possible ways (from specific to DBpedia) to increase the precision
of algorithm is to find "number of transitions in Wikipedia" between words
in context. Am I thinking in right direction?
By the way, if I choose in online demo `Confidence` -> 0, select `n-best`
and press `Annotate`, what the numbers in dropdown list means? For example,
for word `First` first two are World War I (1.00) and Football League First
Division (1.45e-7)
2015-03-09 14:52 GMT+02:00 Oleksandr Olgashko <[email protected]>:
> Found warm-up tasks for DBpedia Spotlight, sorry for inconvenience
>
> 2015-03-09 13:06 GMT+02:00 Oleksandr Olgashko <[email protected]>:
>
>> Thanks for answers,
>>
>> On previous project I was working on several named entity recognition
>> classifiers (naive Bayes and conditional random field based, we used
>> Ontonotes corpus data), also I have brief experience with Apache Spark.
>> So, probably, 5.16 and 5.17 would be most suitable for me, and 5.14 is
>> worth to think about.
>> Could you please give some warm-up tasks for these ideas?
>> Also, is it possible to use Stanford NLP (GPL license?)
>>
>> 2015-03-09 12:42 GMT+02:00 David Przybilla <[email protected]>:
>>
>>> Hi Oleksandr,
>>>
>>> 5.16, 5.17 both involve Scala + A bit of Natural Language Processing.
>>> 5.17 is more about being able to massage a wikipedia dump and getting
>>> numbers out of it for Name entity recognition.
>>>
>>>
>>>
>>> On Mon, Mar 9, 2015 at 9:27 AM, Dimitris Kontokostas <[email protected]>
>>> wrote:
>>>
>>>> Hi Oleksandr & welcome
>>>>
>>>> I'd suggest you narrow down your topics to very few 1-2 in order to be
>>>> able to better focus on your final proposal.
>>>> Let us know if you have any questions
>>>>
>>>> Cheers,
>>>> DImitris
>>>>
>>>> On Sun, Mar 8, 2015 at 11:59 PM, Oleksandr Olgashko <
>>>> [email protected]> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I'd like to investigate possibilities to participate in GSoC as part
>>>>> of DBpedia organizations. Since I never participated in GSoC before, some
>>>>> questions may sound naive.
>>>>>
>>>>> My name is Oleksandr Olgashko, I'm a first year master's student in
>>>>> Taras Shevchenko National University of Kyiv (Ukraine). Some links about
>>>>> me:
>>>>> https://github.com/dveim
>>>>> https://www.linkedin.com/in/olgashko
>>>>> https://www.coursera.org/user/i/d5878dc26bfe6cbe456d0e119d96e551
>>>>>
>>>>> My primary interests are machine learning (particularly, natural
>>>>> language processing, what I was doing on previous project) and data
>>>>> analysis, also I'm a fan of Scala programming language. DBpedia has most
>>>>> natural combination of those skills.
>>>>>
>>>>> On your ideas page I've found several interesting projects, like 5.3,
>>>>> 5.7, 5.14, 5.16, 5.17. Which of them are more relevant, so I can start
>>>>> research deeper?
>>>>>
>>>>> Thanks for answers
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Dive into the World of Parallel Programming The Go Parallel Website,
>>>>> sponsored
>>>>> by Intel and developed in partnership with Slashdot Media, is your hub
>>>>> for all
>>>>> things parallel software development, from weekly thought leadership
>>>>> blogs to
>>>>> news, videos, case studies, tutorials and more. Take a look and join
>>>>> the
>>>>> conversation now. http://goparallel.sourceforge.net/
>>>>> _______________________________________________
>>>>> Dbpedia-gsoc mailing list
>>>>> [email protected]
>>>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Kontokostas Dimitris
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Dive into the World of Parallel Programming The Go Parallel Website,
>>>> sponsored
>>>> by Intel and developed in partnership with Slashdot Media, is your hub
>>>> for all
>>>> things parallel software development, from weekly thought leadership
>>>> blogs to
>>>> news, videos, case studies, tutorials and more. Take a look and join the
>>>> conversation now. http://goparallel.sourceforge.net/
>>>> _______________________________________________
>>>> Dbpedia-gsoc mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>>>>
>>>>
>>>
>>
>
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Dbpedia-gsoc mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc