Dear Cai Zhiwei,
it would be nice to have Scala code that:
- takes in a list of URIs
- goes through the annotations provided by Google for those URIs
- uses something like Nutch to crawl the pages referenced in the Google
Mentions Corpus for those annotations.
That way, you can start with a small set of URIs for tests, and increase to
larger sets later (e.g. get mentions for all People), or even to the whole
collection.
Cheers,
Pablo
On Tue, Apr 16, 2013 at 7:38 AM, Dimitris Kontokostas <[email protected]>wrote:
> Great!
>
> The "Google Corpus" idea description page [1] is updated, you can also
> look at the related thread for more information.
> For warm up tasks on DBpedia Spotlight you can take a look here [2] first.
> Then Pablo, Max or Jo can give you a more specific task.
>
> Best,
> Dimitris
>
> [1] http://wiki.dbpedia.org/gsoc2013/ideas/GoogleCorpus
> [2]
> https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Warm-up-tasks
>
>
> On Sat, Apr 13, 2013 at 9:03 PM, 蔡志威 <[email protected]> wrote:
>
>> In the last 4 days,I've finish the following work:
>> 1.Read most wiki pages in github of depedia extraction framework and
>> depedia-spotlight and also added some noted and correct some basic mistakes.
>> 2.Downloaded the code and tested some data in my computer.
>> 3.Set up my dev enviroment with intelliJ IDEA.
>> 4.Learnt maven and scala,so I could get a basic idea of the constructure
>> of the whole project.
>> 5.I found I might prefer the idea "Generalize input formats and add
>> support for Google mention corpus" so I try to get familliar with wikipedia
>> dumps format and google memtion corpus.
>>
>> I would be grateful if you could give me some suggestion for the
>> following days.Codes and some materials to read,some issues to solve or
>> other things that can help me get a deeper understanding of this idea.
>>
>> Thanks for your time,
>> Cai Zhiwei
>>
>>
>> ------------------------------------------------------------------------------
>> Precog is a next-generation analytics platform capable of advanced
>> analytics on semi-structured data. The platform includes APIs for building
>> apps and a phenomenal toolset for data science. Developers can use
>> our toolset for easy data analysis & visualization. Get a free account!
>> http://www2.precog.com/precogplatform/slashdotnewsletter
>> _______________________________________________
>> Dbpedia-gsoc mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>>
>>
>
>
> --
> Kontokostas Dimitris
>
>
> ------------------------------------------------------------------------------
> Precog is a next-generation analytics platform capable of advanced
> analytics on semi-structured data. The platform includes APIs for building
> apps and a phenomenal toolset for data science. Developers can use
> our toolset for easy data analysis & visualization. Get a free account!
> http://www2.precog.com/precogplatform/slashdotnewsletter
> _______________________________________________
> Dbpedia-gsoc mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>
>
--
Pablo N. Mendes
http://pablomendes.com
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Dbpedia-gsoc mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc