It is great that DBPedia will get access to a live feed and can generate the data dumps much more frequently to keep it fresh. Daily or weekly freshness sounds reasonable to me.
Is this feed something DBPedia specific you will get access to or is it something everyone can access? Is this feed the same thing as these IRC feeds? http://meta.wikimedia.org/wiki/IRC_channels#Recent_changes On Thu, Jun 18, 2009 at 2:47 PM, Georgi Kobilarov<[email protected]> wrote: > Hi Omid, > > it is true (as Brian wrote) that the Wikimedia Foundation has offered us > their Wikipedia live feed. And we are in the process of developing and > deploying a real-time update version of DBpedia. Jens also wrote about > that recently on the mailing list. > > The reason that we need some time for preparing DBpedia 3.3 is that > there have been a bunch of changes to the DBpedia extraction framework, > some experiments with the code base, and we simply need to get things > together again. Usually, it takes around 1 day to process a whole > Wikipedia dump, including importing it into MySql and running the > extraction. > > But due to the bunch of code changes, we were facing some bugs and > extraction errors. Nothing to worry about, but it requires time to get > up to speed again. And since DBpedia still is kind of a spare time > project for all participants among our other research projects, we don't > always find the time to work on it. > > In the long-term, DBpedia will probably only rely on the Wikipedia > update feed instead of the Wikipedia dump files. There will be daily or > weekly diffs, monthly full dumps and hopefully a DBpedia live feed as > well. > > I hope that sounds reasonable. If not, please let us now. > > Cheers, > Georgi > > -- > Georgi Kobilarov > Freie Universtität Berlin > www.georgikobilarov.com > >> -----Original Message----- >> From: Omid [mailto:[email protected]] >> Sent: Thursday, June 18, 2009 10:45 PM >> To: Georgi Kobilarov >> Cc: [email protected] >> Subject: Re: [Dbpedia-discussion] DBPedia freshness >> >> Thanks Georgi, >> >> I have also noted that Wikipedia significantly has increased the >> frequency with which they are releasing their dumps. I remember there >> was a period from October 2008 to early this year when no new dumps >> were completed for 5-6 months time. >> The question is, how much manual work and how long processing time is >> there for DBPedia to release a new dump once a new Wikipedia dump is >> released. >> Assume that Wikipedia would start releasing complete data dumps on a >> daily basis, would DBPedia theorietically be able to release dumps >> also on a daily basis? >> Or is the processing itself require for example one week of processing >> making impossible to have DBPedia daily fresh even if Wikipedia would >> have their data dumps daily fresh. >> >> Basically I try to figure out what the minimum delay would be from a >> new Wikipedia dump is released to that a new DBPedia is released is >> with the current DBPedia scripts. >> Also, if the process currently involves many manual steps (to download >> Wikipedia dump, process the data etc.), is it something that could >> very easily be automated so that keeping DBPedia fresh would not >> involve any human intervention? >> >> Thanks >> /Omid >> >> >> On Thu, Jun 18, 2009 at 12:20 PM, Georgi >> Kobilarov<[email protected]> wrote: >> > Hi Omid, >> > >> > there are several Wikipedia dump files we are importing in order to >> > extract the data for DBpedia (see the importwiki.php in the DBpedia >> > SVN). >> > >> > It is true that DBpedia is quite out of date at the moment. There > has >> > been a lack of Wikipedia dumps during winter and spring, but >> Wikipedia >> > recently started to publish dumps much more frequently. We are >> currently >> > in the process of preparing DBpedia 3.3, based on a late May dump of >> the >> > English Wikipedia (and dumps of other languages around that time). >> > >> > I can only roughly estimate when DBpedia 3.3 will be available, but >> keep >> > an eye on the DBpedia mailinglist around end of next week... >> > >> > Cheers, >> > Georgi >> > >> > -- >> > Georgi Kobilarov >> > Freie Universtität Berlin >> > www.georgikobilarov.com >> > >> >> -----Original Message----- >> >> From: Omid [mailto:[email protected]] >> >> Sent: Thursday, June 18, 2009 9:00 PM >> >> To: [email protected] >> >> Subject: [Dbpedia-discussion] DBPedia freshness >> >> >> >> Can someone let me know which Wikipedia data dump file it is that > is >> >> the input to DBPedia? >> >> >> >> On http://wiki.dbpedia.org/Documentation it says "...all articles >> from >> >> the Wikipedia SQL-Dump...". >> >> >> >> Is it this one we talk about? >> >> http://download.wikimedia.org/enwiki/latest/enwiki-latest- >> page.sql.gz >> >> >> >> Or is it another file that is being used as input into the DBPedia >> >> system? >> >> >> >> Also, I see that the latest dump of DBPedia is 8 months old (from >> >> October 2008). >> >> Is there anything preventing DBPedia to create a fresher dump from >> the >> >> data at http://download.wikimedia.org/enwiki/latest/? >> >> I'm curious to know if the reason the data is not fresh is an issue >> >> with that someone actually has to manually download the Wikipedia >> data >> >> and run the scripts (and it has just not been done yet), or if the >> >> issue is technical somehow and that it has failed with newer data? >> >> >> >> >> >> Thanks >> >> /Omid >> >> >> >> >> > > --------------------------------------------------------------------- >> -- >> >> ------- >> >> Crystal Reports - New Free Runtime and 30 Day Trial >> >> Check out the new simplified licensing option that enables > unlimited >> >> royalty-free distribution of the report engine for externally > facing >> >> server and web deployment. >> >> http://p.sf.net/sfu/businessobjects >> >> _______________________________________________ >> >> Dbpedia-discussion mailing list >> >> [email protected] >> >> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion >> > > ------------------------------------------------------------------------------ Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
