Hi Naveen,
In order to create a Spotlight Model we have to massage the Wikipedia dump
in order get some statistics out of it.
Those statistics include the probability of seeing a surface form, creating
a context vector for each entity..etc.
- It seems that the current script to do generate those models made on pig
is broken. Check the issues below
- It seems there are other projects who could benefit from this framework
if done properly
- Spark is a good alternative, given that is easier to model map/reduce
problems but also because it is fast.
A good start would be :
1. Play a bit with spotlight ( http://dbpedia-spotlight.github.io/demo/ )
2. Check the warm up tasks
https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Warm-up-tasks
3. Compile and run spotlight locally
4. Understand what are stores and what values are inside them
5. Take a look at the script generating the current model:
https://github.com/dbpedia-spotlight/pignlproc .
Issues:
https://github.com/dbpedia-spotlight/dbpedia-spotlight/issues/329
https://github.com/dbpedia-spotlight/dbpedia-spotlight/issues/321
On Fri, Mar 6, 2015 at 8:59 PM, Naveen Madhire <[email protected]>
wrote:
> Hi Team,
>
>
> I am currently pursuing Masters in Data Science from Indiana University. I
> am very much interested in participating in this years GSOC and the idea
> listed on DBPedia's website caught my eye as I am confortable in Apache
> Spark, Entity linking and JAVA.
>
> DBpedia Spotlight – Better Tools for Model Creation
>
> I don't see any discussion happening in the archives.
>
> If possible can anyone share any references to look into and any details
> which will help me to understand the current project in detail.
>
> Please let me know.
>
>
> Thanks,
> Naveen M
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for
> all
> things parallel software development, from weekly thought leadership blogs
> to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Dbpedia-gsoc mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>
>
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Dbpedia-gsoc mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc