Hi Gaurav,
I don't know your exact use case but here's what we do. There is an IRC channel
where wikipedia continuously lists the pages as and when it changes. We listen
to the irc channel and every hour make a list of unique pages that changed.
Wikipedia's mediawiki software gives you an api to download these pages in bulk
, it looks like this
"http://en.wikipedia.org/w/api.php?action=query&export&exportnowrap&prop=revisions&rvprop=timestamp|content&titles="
You can download these pages and put it in the same format as the full dump
download by appending the wikipedia namespace list( you can get the list from
"http://en.wikipedia.org/w/api.php?action=query&export&exportnowrap&prop=revisions&rvprop=timestamp|content".
There after you can put the file in the same location as the full dump and
evoke the extraction code. It works as expected.
Regards
Amit
From: gaurav pant <[email protected]<mailto:[email protected]>>
Date: Thursday, March 7, 2013 12:17 PM
To: Dimitris Kontokostas <[email protected]<mailto:[email protected]>>,
"[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Subject: Re: [Dbpedia-discussion] page article has last modified timestamp
Hi All,
Thanks Dimitris for your help..
I also want one more confirmation from you.
I just gone through the code of InfoboxExtractor. There it seems me that code
is written to process data page by page.(<page>..</page>). If i will remove all
those pages from "page-article" dump using some perl/python script and than
apply Infobox extraction or Abstract extraction than we will get only updated
triplets as output like DBpedia Live for English.
Please correct me if I am wrong.
Thanks
------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the
endpoint security space. For insight on selecting the right partner to
tackle endpoint security challenges, access the full report.
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion