Hi,

On Mon, Apr 29, 2013 at 2:20 AM, Denis Lukovnikov
<[email protected]> wrote:
> I've also got a question for the first existing Spotlight idea (Google
> mention corpus): Is this just like rebuilding the Lexicalization dataset
> using Google data with a few MapReduce jobs? Or is there more to it?

The context to build the disambiguation models will also be extracted from it.


> As for the worry for the overlap between Google mention corpus and the
> current data, doesn't the Google corpus also contain all mentions on
> Wikipedia as well?

I did not look into it, but I assume they filtered Wikipedia pages from this.


> Also, I don't get why we need to crawl
> pages (from a discussion with Cai Zhiwei) to generate a new Lexicalization
> dataset. Or do we need to crawl them to get the context of the mention?

Yes exactly, for the context.


Cheers,
Max

------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
Dbpedia-gsoc mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc

Reply via email to