Hi, On Mon, Apr 29, 2013 at 2:20 AM, Denis Lukovnikov <[email protected]> wrote: > I've also got a question for the first existing Spotlight idea (Google > mention corpus): Is this just like rebuilding the Lexicalization dataset > using Google data with a few MapReduce jobs? Or is there more to it?
The context to build the disambiguation models will also be extracted from it. > As for the worry for the overlap between Google mention corpus and the > current data, doesn't the Google corpus also contain all mentions on > Wikipedia as well? I did not look into it, but I assume they filtered Wikipedia pages from this. > Also, I don't get why we need to crawl > pages (from a discussion with Cai Zhiwei) to generate a new Lexicalization > dataset. Or do we need to crawl them to get the context of the mention? Yes exactly, for the context. Cheers, Max ------------------------------------------------------------------------------ Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET Get 100% visibility into your production application - at no cost. Code-level diagnostics for performance bottlenecks with <2% overhead Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap1 _______________________________________________ Dbpedia-gsoc mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
