Oh, right! Yes! Thanks, Max. I thought I was going crazy for suggesting
using Nutch, but now I remember why. Now I think I am going crazy for going
back on my suggestion of using Nutch. :) Damn, I'm doing too much stuff at
the same time! Great to have awesome people around that pick up the ball
(and score a goal) when I drop it! :)

Cheers,
Pablo


On Tue, Apr 16, 2013 at 11:10 AM, Max Jakob <[email protected]> wrote:

> Hi all,
>
> On Tue, Apr 16, 2013 at 9:54 AM, Pablo N. Mendes <[email protected]>
> wrote:
> > I seem to have missed that the annotations already come with TOKEN
> > annotations.
>
> I'm afraid these TOKEN annotations are not usable for our context
> models, because they are "The byte offset of the 10 least frequent
> words on the page, to act as a signature to ensure that the underlying
> text hasn’t changed -- think of this as a version, or fingerprint, of
> the page." [1]
>
> The blog post goes on to say that there are "Software tools (on the
> UMass site [2]) to: download the web pages; extract the mentions,
> [...]; select the text around the mentions as local context; and
> compute evaluation metrics over predicted entities." [1]
>
> But [2] says that "We are currently writing code to download the
> webpages listed in the above dataset, to find the relevant links from
> these webpages, and to extract the context around the links. The
> resulting dataset will also be released when ready, and will be linked
> here."
> Only a bash command that downloads all required web pages is given at
> this point in time.
>
> Maybe it is a good idea to write our own extractors for this?
>
> Cheers,
> Max
>
>
> [1]
> http://googleresearch.blogspot.nl/2013/03/learning-from-big-data-40-million.html
> [2] http://www.iesl.cs.umass.edu/data/wiki-links
>



-- 

Pablo N. Mendes
http://pablomendes.com
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Dbpedia-gsoc mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc

Reply via email to