2013/12/4 Paul Houle ontolo...@gmail.com
I think I could get this data out of some API, but there are great
HTML 5 parsing libraries now, so a link extractor from HTML can be
built as quickly than an API client.
There are two big advantages of looking at links in HTML: (i) you can
use
@Paul,
unfortunately HTML wikipedia dumps are not released anymore (they are old
static dumps as you said).
This is a problem for a project like DBpedia, as you can easily understand.
Moreover, I did not mean that it is not possible to crawl Wikipedia
instances or load dump into a private
The DBpedia Way of extracting the citations probably would be to
build something that treats the citations the way infoboxes are
treated.
It's one way of doing things, and it has it's own integrity, but
it's not the way I do things. (DBpedia does it this way about as well
as it can be done,
I just released a version of Infovore that can do scalable
differencing of RDF data sets, producing output in the RDF Patch
format
http://afs.github.io/rdf-patch/
The tool is written up here
https://github.com/paulhoule/infovore/wiki/rdfDiff
I ran this against two different weeks of Freebase