Hi Haiyang, On 24 April 2013 18:31, haiyang liu <[email protected]> wrote: > Hi Dimitris, > Thanks for the info, I would like to start on the continuous extraction > project. > Based on the info on the idea page, there are a lot of sub-projects,
Just for the record, here's the link again: http://wiki.dbpedia.org/gsoc2013/ideas/ContinuousExtraction > do they need to be done in orders? Not necessarily, but it would probably make things easier for you. For example, a unified configuration does not depend on any other steps, but without it, implementing the other steps will be difficult. > Can you help direct me a point to start, like a warm up task for this > project? A first warm up task would be to simply get acquainted with the DBpedia extraction framework, i.e. mostly with the core, dump and scripts modules, but also to some extent with the others, at least to know what they do and which parts of the core and/or dump modules they depend on. Then use the download script to download Wikipedia dumps. Run some extractions. Then look at the configuration files: there are several different .properties files with different syntax and semantics, there are parameters in pom.xml, some parameters must be given on the command line, and so on. Then look at Spring and some other dependency injection (DI) frameworks. In a perfect world, someone would have written a DI framework that leverages Scala's abilities. Find that perfect framework. Or build it. ;-) You may want to think about an architecture for this project: should all the different extraction steps run in the same JVM, or should they run in separate processes? This decision has huge implications for memory consumption, modularity, communication protocols between the extraction steps, etc. Should different languages use the same extractor objects? And so on. >>> I an a big fan of machine learning and data analysis and just did a >>> research project in news processing. It would be great if you took on the continuous extraction project, and it is an interesting and challenging project - you will need / acquire knowledge about large multi-threaded Java server applications with complex interactions between modules. But I think that it probably doesn't have as much to do with machine learning etc. than some other project ideas. At least that's my point of view, maybe the other developers can shed a different light on this question. I'm just telling you this so you're not disappointed when you find out that there are no statistics or heuristics involved in this project. Remember that project ideas are just that - ideas. In this case, I have been thinking about this idea for several months, so I have relatively precise thoughts about how it could or should be implemented. But in the end, you are welcome to come up with some totally new ideas that we have never thought of. Cheers, Christopher > Thanks a lot! > > > On Wed, Apr 24, 2013 at 2:03 AM, Dimitris Kontokostas <[email protected]> > wrote: >> >> Hi Haiyang & welcome, >> >> Coming late give you less time to prepare but we still have ~10 days left. >> >> We have a few students interested in ideas #1 & #2 but this doesn't >> necessarily mean that they will apply in the end. All discussion here are >> public and happen on this mailing list so you can read the archives and >> judge for yourself. >> Depending on what idea(s) you choose there can be different warm-up tasks >> so, feel free to ask :) >> >> Cheers, >> Dimitris >> >> >> On Wed, Apr 24, 2013 at 1:38 AM, Haiyang Liu <[email protected]> >> wrote: >>> >>> Hi, >>> My name is Haiyang, I am a raising senior student from Rice University. >>> I an a big fan of machine learning and data analysis and just did a >>> research project in news processing. >>> I am very interesting in the Dbpedia project and just heard from a friend >>> that it has this GSoC project that I can join in. >>> I know it is kind of late since the deadline is approaching so I want to >>> know if it is still possible for me to apply at this time. >>> I am really interested in the 3 topics in the idea list: >>> 1) Massive extraction of triples from Media wikis >>> 2) Wikitionary 2 RDF Assistance GUI >>> 3) Continuous Extraction >>> I am wondering if there are anyone already start working on these >>> projects and I should choose others that are less competitive to apply. >>> Right now I am looking through the doc and warm-up exercises and hope >>> some one can help me with my questions. >>> thanks a lot! >>> >>> Haiyang Liu >>> Rice University >>> CS 2014 >>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Try New Relic Now & We'll Send You this Cool Shirt >>> New Relic is the only SaaS-based application performance monitoring >>> service >>> that delivers powerful full stack analytics. Optimize and monitor your >>> browser, app, & servers with just a few lines of code. Try New Relic >>> and get this awesome Nerd Life shirt! >>> http://p.sf.net/sfu/newrelic_d2d_apr >>> _______________________________________________ >>> Dbpedia-gsoc mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc >>> >> >> >> >> -- >> Kontokostas Dimitris > > > > ------------------------------------------------------------------------------ > Try New Relic Now & We'll Send You this Cool Shirt > New Relic is the only SaaS-based application performance monitoring service > that delivers powerful full stack analytics. Optimize and monitor your > browser, app, & servers with just a few lines of code. Try New Relic > and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr > _______________________________________________ > Dbpedia-gsoc mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc > ------------------------------------------------------------------------------ Try New Relic Now & We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, & servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr _______________________________________________ Dbpedia-gsoc mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
