Hi Omid, it is true (as Brian wrote) that the Wikimedia Foundation has offered us their Wikipedia live feed. And we are in the process of developing and deploying a real-time update version of DBpedia. Jens also wrote about that recently on the mailing list.
The reason that we need some time for preparing DBpedia 3.3 is that there have been a bunch of changes to the DBpedia extraction framework, some experiments with the code base, and we simply need to get things together again. Usually, it takes around 1 day to process a whole Wikipedia dump, including importing it into MySql and running the extraction. But due to the bunch of code changes, we were facing some bugs and extraction errors. Nothing to worry about, but it requires time to get up to speed again. And since DBpedia still is kind of a spare time project for all participants among our other research projects, we don't always find the time to work on it. In the long-term, DBpedia will probably only rely on the Wikipedia update feed instead of the Wikipedia dump files. There will be daily or weekly diffs, monthly full dumps and hopefully a DBpedia live feed as well. I hope that sounds reasonable. If not, please let us now. Cheers, Georgi -- Georgi Kobilarov Freie Universtität Berlin www.georgikobilarov.com > -----Original Message----- > From: Omid [mailto:[email protected]] > Sent: Thursday, June 18, 2009 10:45 PM > To: Georgi Kobilarov > Cc: [email protected] > Subject: Re: [Dbpedia-discussion] DBPedia freshness > > Thanks Georgi, > > I have also noted that Wikipedia significantly has increased the > frequency with which they are releasing their dumps. I remember there > was a period from October 2008 to early this year when no new dumps > were completed for 5-6 months time. > The question is, how much manual work and how long processing time is > there for DBPedia to release a new dump once a new Wikipedia dump is > released. > Assume that Wikipedia would start releasing complete data dumps on a > daily basis, would DBPedia theorietically be able to release dumps > also on a daily basis? > Or is the processing itself require for example one week of processing > making impossible to have DBPedia daily fresh even if Wikipedia would > have their data dumps daily fresh. > > Basically I try to figure out what the minimum delay would be from a > new Wikipedia dump is released to that a new DBPedia is released is > with the current DBPedia scripts. > Also, if the process currently involves many manual steps (to download > Wikipedia dump, process the data etc.), is it something that could > very easily be automated so that keeping DBPedia fresh would not > involve any human intervention? > > Thanks > /Omid > > > On Thu, Jun 18, 2009 at 12:20 PM, Georgi > Kobilarov<[email protected]> wrote: > > Hi Omid, > > > > there are several Wikipedia dump files we are importing in order to > > extract the data for DBpedia (see the importwiki.php in the DBpedia > > SVN). > > > > It is true that DBpedia is quite out of date at the moment. There has > > been a lack of Wikipedia dumps during winter and spring, but > Wikipedia > > recently started to publish dumps much more frequently. We are > currently > > in the process of preparing DBpedia 3.3, based on a late May dump of > the > > English Wikipedia (and dumps of other languages around that time). > > > > I can only roughly estimate when DBpedia 3.3 will be available, but > keep > > an eye on the DBpedia mailinglist around end of next week... > > > > Cheers, > > Georgi > > > > -- > > Georgi Kobilarov > > Freie Universtität Berlin > > www.georgikobilarov.com > > > >> -----Original Message----- > >> From: Omid [mailto:[email protected]] > >> Sent: Thursday, June 18, 2009 9:00 PM > >> To: [email protected] > >> Subject: [Dbpedia-discussion] DBPedia freshness > >> > >> Can someone let me know which Wikipedia data dump file it is that is > >> the input to DBPedia? > >> > >> On http://wiki.dbpedia.org/Documentation it says "...all articles > from > >> the Wikipedia SQL-Dump...". > >> > >> Is it this one we talk about? > >> http://download.wikimedia.org/enwiki/latest/enwiki-latest- > page.sql.gz > >> > >> Or is it another file that is being used as input into the DBPedia > >> system? > >> > >> Also, I see that the latest dump of DBPedia is 8 months old (from > >> October 2008). > >> Is there anything preventing DBPedia to create a fresher dump from > the > >> data at http://download.wikimedia.org/enwiki/latest/? > >> I'm curious to know if the reason the data is not fresh is an issue > >> with that someone actually has to manually download the Wikipedia > data > >> and run the scripts (and it has just not been done yet), or if the > >> issue is technical somehow and that it has failed with newer data? > >> > >> > >> Thanks > >> /Omid > >> > >> > > --------------------------------------------------------------------- > -- > >> ------- > >> Crystal Reports - New Free Runtime and 30 Day Trial > >> Check out the new simplified licensing option that enables unlimited > >> royalty-free distribution of the report engine for externally facing > >> server and web deployment. > >> http://p.sf.net/sfu/businessobjects > >> _______________________________________________ > >> Dbpedia-discussion mailing list > >> [email protected] > >> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion > > ------------------------------------------------------------------------------ Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
