Hi Abhishek,

You are free to contribute :) I will try to keep on reviewing PRs
if that is alright.



On Tue, Apr 28, 2015 at 7:47 AM, Abhishek Gupta <a.gu...@gmail.com> wrote:

> Hi all,
>
> My proposal has not been selected for GSoC. But I am still want to
> continue with my project. So can someone provide me any guidelines (if I
> can continue)?
>
> Thanks,
> Abhishek
>
> On Thu, Apr 9, 2015 at 11:53 PM, Abhishek Gupta <a.gu...@gmail.com> wrote:
>
>> Hi Thiago,
>>
>> Thanks for your reply and assurance.
>> Moreover I replied your question for the extraction framework and I have
>> also created an issue regarding using bold instances as the probable
>> surface forms here
>> <https://github.com/dbpedia-spotlight/dbpedia-spotlight/issues/353>.
>>
>> Thanks,
>> Abhishek
>>
>> On Thu, Apr 9, 2015 at 1:19 AM, Thiago Galery <tgal...@gmail.com> wrote:
>>
>>> Hi Abhishek,
>>> sorry for taking so long to write to you. Things at work have been
>>> really busy. About the issue you raised about the originality of your
>>> proposal, rest assured that no one sent a proposal similar to yours.
>>>
>>> I'm happy that you send a PR for the extraction framework. It seems that
>>> Dimitris is already taking a look at it.
>>> As for your suggestions in Spotlight, just removing the stopword filter
>>> is something that I don't advise that much, cause I remember getting a lot
>>> of crap once. Maybe it should be modified somehow. If you have a good idea
>>> and want to send a PR, it would be very welcome. I think discussing things
>>> on github would be better.
>>>
>>> All the best,
>>> Thiago
>>>
>>> On Mon, Apr 6, 2015 at 6:15 AM, Abhishek Gupta <a.gu...@gmail.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> Recently I was checking out the indexing process of dbpedia-spotlight
>>>> and I observe a certain things:
>>>>
>>>> 1) There is a missing constructor definition in wikiPage object
>>>> <https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/sources/WikiPage.scala>
>>>>  for
>>>> instance defined in function wikiPageCopy here
>>>> <https://github.com/dbpedia-spotlight/dbpedia-spotlight/blob/master/index/src/main/scala/org/dbpedia/spotlight/io/DisambiguationContextSource.scala#L67>.
>>>> For this I have created an PR
>>>> https://github.com/dbpedia/extraction-framework/pull/377
>>>>
>>>> 2) For stopwords filter defined here
>>>> <https://github.com/dbpedia-spotlight/dbpedia-spotlight/blob/master/index/src/main/scala/org/dbpedia/spotlight/util/ExtractCandidateMap.scala#L186>,
>>>> I did an analysis over the conceptURI's extraction with stopwords list
>>>> here
>>>> <http://wifo5-04.informatik.uni-mannheim.de/downloads/release-0.4/stopwords.en.list>.
>>>> From the analysis it came out that we are neglecting around 25481 entities
>>>> in which almost all of them are from important category like music, film,
>>>> band etc. E.g. Am_(musician)
>>>> <http://en.wikipedia.org/wiki/AM_(musician)>, Home_(2015_film)
>>>> <http://en.wikipedia.org/wiki/Home_(2015_film)>, The_Who
>>>> <http://en.wikipedia.org/wiki/The_Who> etc. And if we do case
>>>> sensitive checking (checking if entity contains more than one capital
>>>> alphabets as one is default) even then we will reject some entities which
>>>> has only one word like Am, Home etc. Moreover the garbage (can't etc.) we
>>>> will incur after removing this filter won't be much. So i suggest if we can
>>>> remove this filter.
>>>>
>>>> 3) I would like to suggest a surface form extraction. If we can extract
>>>> bold text in the first line of the wikipedia then we can use that as
>>>> probable Surface Form for that entity. E.g. Stanford_University
>>>> <http://en.wikipedia.org/wiki/Stanford_University>, Aon_(company)
>>>> <http://en.wikipedia.org/wiki/Aon_%28company%29>, Radio_Warwick
>>>> <http://en.wikipedia.org/wiki/Radio_Warwick>, Phi_Gamma_Delta
>>>> <http://en.wikipedia.org/wiki/Phi_Gamma_Delta> etc. These are the best
>>>> Surface Forms for the respective Entity.
>>>>
>>>> Thanks,
>>>> Abhishek
>>>>
>>>> On Fri, Mar 27, 2015 at 11:56 AM, Abhishek Gupta <a.gu...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I would also like to inform that in one of the recent mails my
>>>>> proposal has been gone public when Thiago accidentally sent a mail to me
>>>>> and dbpedia-gsoc mailing list. Details of the mails are below. The Google
>>>>> docs link was there in the quotes and the doc can be seen and even edited
>>>>> by anyone with that link, but nobody have changed the content of the doc.
>>>>> And I believe there might be chances that someone will copy my ideas. So
>>>>> I request you to take care of this issue. And I hope this might not
>>>>> affect my application.
>>>>> As of now I have changed the sharing settings, so please inform me if
>>>>> there will be any access problem.
>>>>>
>>>>> *Mail details:*
>>>>> from:Thiago Galery <tgal...@gmail.com>to:Abhishek Gupta <
>>>>> a.gu...@gmail.com>,
>>>>> dbpedia-gsoc <dbpedia-gsoc@lists.sourceforge.net>
>>>>> date:Tue, Mar 24, 2015 at 3:47 AMsubject:Re: [Dbpedia-gsoc] Fwd:
>>>>> Contribute to DbPedia
>>>>>
>>>>> I have also modified my proposal in Candidate Entity Scoring
>>>>> methodology. Please take a look at it.
>>>>> GSoC proposal link:
>>>>> https://www.google-melange.com/gsoc/proposal/review/student/google/gsoc2015/abhishek_g/5629499534213120
>>>>> Google Docs Link:
>>>>> https://docs.google.com/document/d/1U4BvJpGUvL2odVA6VxnYggfEX_hmLSYP4yqhXB7dLQU/edit
>>>>>
>>>>> Moreover I would like to ask one more question which might help me in
>>>>> modelling the problem. In below example texts which entity would you like
>>>>> to annotate "river" (in bold) with "http://dbpedia.org/page/River"; or
>>>>> "http://dbpedia.org/page/River_Thames"; or something else?
>>>>> 1. The River Thames is a river that flows through southern England.
>>>>> This *river *is the longest in entire England and the second longest
>>>>> in the United Kingdom, after the River Severn.
>>>>> 2. I would like to swim in the longest *river* entirely in England
>>>>> and the second longest in the United Kingdom, after the River Severn.
>>>>>
>>>>> In the first example "river" is explicitly referring to the Thames
>>>>> River. It is like a co-reference resolution. But in the second example
>>>>> there is an implicit reference to the Thames river as it is the longest in
>>>>> the England etc. which we are able to infer due to the context. So I would
>>>>> like to know whether we are trying to annotate river with a simple "River"
>>>>> or "Thames River".
>>>>>
>>>>> Thanks,
>>>>> Abhishek
>>>>>
>>>>>
>>>>
>>>
>>
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Dbpedia-gsoc mailing list
> Dbpedia-gsoc@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>
>
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Dbpedia-gsoc mailing list
Dbpedia-gsoc@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc

Reply via email to