Hi Omid,

it is true (as Brian wrote) that the Wikimedia Foundation has offered us
their Wikipedia live feed. And we are in the process of developing and
deploying a real-time update version of DBpedia. Jens also wrote about
that recently on the mailing list.

The reason that we need some time for preparing DBpedia 3.3 is that
there have been a bunch of changes to the DBpedia extraction framework,
some experiments with the code base, and we simply need to get things
together again. Usually, it takes around 1 day to process a whole
Wikipedia dump, including importing it into MySql and running the
extraction. 

But due to the bunch of code changes, we were facing some bugs and
extraction errors. Nothing to worry about, but it requires time to get
up to speed again. And since DBpedia still is kind of a spare time
project for all participants among our other research projects, we don't
always find the time to work on it.

In the long-term, DBpedia will probably only rely on the Wikipedia
update feed instead of the Wikipedia dump files. There will be daily or
weekly diffs, monthly full dumps and hopefully a DBpedia live feed as
well. 

I hope that sounds reasonable. If not, please let us now. 

Cheers,
Georgi

--
Georgi Kobilarov
Freie Universtität Berlin
www.georgikobilarov.com

> -----Original Message-----
> From: Omid [mailto:[email protected]]
> Sent: Thursday, June 18, 2009 10:45 PM
> To: Georgi Kobilarov
> Cc: [email protected]
> Subject: Re: [Dbpedia-discussion] DBPedia freshness
> 
> Thanks Georgi,
> 
> I have also noted that Wikipedia significantly has increased the
> frequency with which they are releasing their dumps. I remember there
> was a period from October 2008 to early this year when no new dumps
> were completed for 5-6 months time.
> The question is, how much manual work and how long processing time is
> there for DBPedia to release a new dump once a new Wikipedia dump is
> released.
> Assume that Wikipedia would start releasing complete data dumps on a
> daily basis, would DBPedia theorietically be able to release dumps
> also on a daily basis?
> Or is the processing itself require for example one week of processing
> making impossible to have DBPedia daily fresh even if Wikipedia would
> have their data dumps daily fresh.
> 
> Basically I try to figure out what the minimum delay would be from a
> new Wikipedia dump is released to that a new DBPedia is released is
> with the current DBPedia scripts.
> Also, if the process currently involves many manual steps (to download
> Wikipedia dump, process the data etc.), is it something that could
> very easily be automated so that keeping DBPedia fresh would not
> involve any human intervention?
> 
> Thanks
> /Omid
> 
> 
> On Thu, Jun 18, 2009 at 12:20 PM, Georgi
> Kobilarov<[email protected]> wrote:
> > Hi Omid,
> >
> > there are several Wikipedia dump files we are importing in order to
> > extract the data for DBpedia (see the importwiki.php in the DBpedia
> > SVN).
> >
> > It is true that DBpedia is quite out of date at the moment. There
has
> > been a lack of Wikipedia dumps during winter and spring, but
> Wikipedia
> > recently started to publish dumps much more frequently. We are
> currently
> > in the process of preparing DBpedia 3.3, based on a late May dump of
> the
> > English Wikipedia (and dumps of other languages around that time).
> >
> > I can only roughly estimate when DBpedia 3.3 will be available, but
> keep
> > an eye on the DBpedia mailinglist around end of next week...
> >
> > Cheers,
> > Georgi
> >
> > --
> > Georgi Kobilarov
> > Freie Universtität Berlin
> > www.georgikobilarov.com
> >
> >> -----Original Message-----
> >> From: Omid [mailto:[email protected]]
> >> Sent: Thursday, June 18, 2009 9:00 PM
> >> To: [email protected]
> >> Subject: [Dbpedia-discussion] DBPedia freshness
> >>
> >> Can someone let me know which Wikipedia data dump file it is that
is
> >> the input to DBPedia?
> >>
> >> On http://wiki.dbpedia.org/Documentation it says "...all articles
> from
> >> the Wikipedia SQL-Dump...".
> >>
> >> Is it this one we talk about?
> >> http://download.wikimedia.org/enwiki/latest/enwiki-latest-
> page.sql.gz
> >>
> >> Or is it another file that is being used as input into the DBPedia
> >> system?
> >>
> >> Also, I see that the latest dump of DBPedia is 8 months old (from
> >> October 2008).
> >> Is there anything preventing DBPedia to create a fresher dump from
> the
> >> data at http://download.wikimedia.org/enwiki/latest/?
> >> I'm curious to know if the reason the data is not fresh is an issue
> >> with that someone actually has to manually download the Wikipedia
> data
> >> and run the scripts (and it has just not been done yet), or if the
> >> issue is technical somehow and that it has failed with newer data?
> >>
> >>
> >> Thanks
> >> /Omid
> >>
> >>
> >
---------------------------------------------------------------------
> --
> >> -------
> >> Crystal Reports - New Free Runtime and 30 Day Trial
> >> Check out the new simplified licensing option that enables
unlimited
> >> royalty-free distribution of the report engine for externally
facing
> >> server and web deployment.
> >> http://p.sf.net/sfu/businessobjects
> >> _______________________________________________
> >> Dbpedia-discussion mailing list
> >> [email protected]
> >> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
> >

------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to