Hello, Shug Boabby schrieb: > Hi everybody, > > I just came across DBPedia and it looks like a really awesome project. > I noticed that the latest dump from the English Wikipedia is quite old > (January 2008, Freebase does a new release every 3 months), so I am > interested in creating a more up-to-date dbpedia from a recent > download of the Wikipedia pages and articles (which I have). Is this > possible?
Yes. > I am most interested in knowing which Wikipedia pages are people, > companies, and possible disambiguations. Perhaps if there are URLs of > the person (or company logo) on Wikipedia, that would be good too. You could use the YAGO hierarchy for this (which will be much improved in the next DBpedia release). Using subclass inferencing in Virtuoso, you can ask the DBpedia SPARQL endpoint for all individuals, which are instances of the class you are looking for, i.e. ask for instances of http://dbpedia.org/class/yago/Person100007846 if you want to find all persons. (Note that there are resource limits on the server, so I believe it will only return 1000 results at a time. You need to ask several queries to get all persons.) > I downloaded the SVN from sourceforge, but I'm afraid I am a little > lost as I was unable to find any documentation. I have a suspicion > that the entry point is the extraction/extract.php file... but I have > no idea where to put the bz2 wikipedia dump file. You need to import all the dumps in a MySQL database. There is an import PHP script provided to do this. (Also see other threads in this list.) The extract.php file is the entry point for the complete extraction process. You may alter it to extract only what you are looking for. Some developer information about the framework can be found on http://wiki.dbpedia.org/Documentation. There is, unfortunately, no good documentation on creating your own DBpedia release. It is a very time-consuming process. > Additionally, I noticed that the PersonData preview link is broken. I fixed it. > This is disappointing as it is the one I am most interested in (so I > had to download the full dataset). Is there any reason why this is > only created from the German data? There is no strict reason not to include it. I believe the extractor had problems on the English Wikipedia, which need to be fixed. It might be included in the next release. Kind regards, Jens -- Dipl. Inf. Jens Lehmann Department of Computer Science, University of Leipzig Homepage: http://www.jens-lehmann.org GPG Key: http://jens-lehmann.org/jens_lehmann.asc ------------------------------------------------------------------------- Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
