> Is this feed something DBPedia specific you will get access to or is > it something everyone can access? Is this feed the same thing as these > IRC feeds? > http://meta.wikimedia.org/wiki/IRC_channels#Recent_changes
we are using a password protected Wikipedia feed, which serves all changed articles with their full markup text in real-time. The difference to the freely available IRC channel is, that the IRC channel only serves the URIs of changed articles, but not their text. So one would have to get the article text from Wikipedia via this API: http://en.wikipedia.org/wiki/Special:Export/Berlin And Wikipedia does block IPs which hammer too much on that API. And the live feed also includes a nice protocol and mediawiki plugin support for setting up a local up-to-date mirror of Wikipedia. Cheers, Georgi > > > > On Thu, Jun 18, 2009 at 2:47 PM, Georgi > Kobilarov<[email protected]> wrote: > > Hi Omid, > > > > it is true (as Brian wrote) that the Wikimedia Foundation has offered > us > > their Wikipedia live feed. And we are in the process of developing > and > > deploying a real-time update version of DBpedia. Jens also wrote > about > > that recently on the mailing list. > > > > The reason that we need some time for preparing DBpedia 3.3 is that > > there have been a bunch of changes to the DBpedia extraction > framework, > > some experiments with the code base, and we simply need to get things > > together again. Usually, it takes around 1 day to process a whole > > Wikipedia dump, including importing it into MySql and running the > > extraction. > > > > But due to the bunch of code changes, we were facing some bugs and > > extraction errors. Nothing to worry about, but it requires time to > get > > up to speed again. And since DBpedia still is kind of a spare time > > project for all participants among our other research projects, we > don't > > always find the time to work on it. > > > > In the long-term, DBpedia will probably only rely on the Wikipedia > > update feed instead of the Wikipedia dump files. There will be daily > or > > weekly diffs, monthly full dumps and hopefully a DBpedia live feed as > > well. > > > > I hope that sounds reasonable. If not, please let us now. > > > > Cheers, > > Georgi > > > > -- > > Georgi Kobilarov > > Freie Universtität Berlin > > www.georgikobilarov.com > > > >> -----Original Message----- > >> From: Omid [mailto:[email protected]] > >> Sent: Thursday, June 18, 2009 10:45 PM > >> To: Georgi Kobilarov > >> Cc: [email protected] > >> Subject: Re: [Dbpedia-discussion] DBPedia freshness > >> > >> Thanks Georgi, > >> > >> I have also noted that Wikipedia significantly has increased the > >> frequency with which they are releasing their dumps. I remember > there > >> was a period from October 2008 to early this year when no new dumps > >> were completed for 5-6 months time. > >> The question is, how much manual work and how long processing time > is > >> there for DBPedia to release a new dump once a new Wikipedia dump is > >> released. > >> Assume that Wikipedia would start releasing complete data dumps on a > >> daily basis, would DBPedia theorietically be able to release dumps > >> also on a daily basis? > >> Or is the processing itself require for example one week of > processing > >> making impossible to have DBPedia daily fresh even if Wikipedia > would > >> have their data dumps daily fresh. > >> > >> Basically I try to figure out what the minimum delay would be from a > >> new Wikipedia dump is released to that a new DBPedia is released is > >> with the current DBPedia scripts. > >> Also, if the process currently involves many manual steps (to > download > >> Wikipedia dump, process the data etc.), is it something that could > >> very easily be automated so that keeping DBPedia fresh would not > >> involve any human intervention? > >> > >> Thanks > >> /Omid > >> > >> > >> On Thu, Jun 18, 2009 at 12:20 PM, Georgi > >> Kobilarov<[email protected]> wrote: > >> > Hi Omid, > >> > > >> > there are several Wikipedia dump files we are importing in order > to > >> > extract the data for DBpedia (see the importwiki.php in the > DBpedia > >> > SVN). > >> > > >> > It is true that DBpedia is quite out of date at the moment. There > > has > >> > been a lack of Wikipedia dumps during winter and spring, but > >> Wikipedia > >> > recently started to publish dumps much more frequently. We are > >> currently > >> > in the process of preparing DBpedia 3.3, based on a late May dump > of > >> the > >> > English Wikipedia (and dumps of other languages around that time). > >> > > >> > I can only roughly estimate when DBpedia 3.3 will be available, > but > >> keep > >> > an eye on the DBpedia mailinglist around end of next week... > >> > > >> > Cheers, > >> > Georgi > >> > > >> > -- > >> > Georgi Kobilarov > >> > Freie Universtität Berlin > >> > www.georgikobilarov.com > >> > > >> >> -----Original Message----- > >> >> From: Omid [mailto:[email protected]] > >> >> Sent: Thursday, June 18, 2009 9:00 PM > >> >> To: [email protected] > >> >> Subject: [Dbpedia-discussion] DBPedia freshness > >> >> > >> >> Can someone let me know which Wikipedia data dump file it is that > > is > >> >> the input to DBPedia? > >> >> > >> >> On http://wiki.dbpedia.org/Documentation it says "...all articles > >> from > >> >> the Wikipedia SQL-Dump...". > >> >> > >> >> Is it this one we talk about? > >> >> http://download.wikimedia.org/enwiki/latest/enwiki-latest- > >> page.sql.gz > >> >> > >> >> Or is it another file that is being used as input into the > DBPedia > >> >> system? > >> >> > >> >> Also, I see that the latest dump of DBPedia is 8 months old (from > >> >> October 2008). > >> >> Is there anything preventing DBPedia to create a fresher dump > from > >> the > >> >> data at http://download.wikimedia.org/enwiki/latest/? > >> >> I'm curious to know if the reason the data is not fresh is an > issue > >> >> with that someone actually has to manually download the Wikipedia > >> data > >> >> and run the scripts (and it has just not been done yet), or if > the > >> >> issue is technical somehow and that it has failed with newer > data? > >> >> > >> >> > >> >> Thanks > >> >> /Omid > >> >> > >> >> > >> > > > --------------------------------------------------------------------- > >> -- > >> >> ------- > >> >> Crystal Reports - New Free Runtime and 30 Day Trial > >> >> Check out the new simplified licensing option that enables > > unlimited > >> >> royalty-free distribution of the report engine for externally > > facing > >> >> server and web deployment. > >> >> http://p.sf.net/sfu/businessobjects > >> >> _______________________________________________ > >> >> Dbpedia-discussion mailing list > >> >> [email protected] > >> >> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion > >> > > > ------------------------------------------------------------------------------ Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
