Hi Marko, it's great that you're working on a Slovenian extraction! In which way did you modify the extractors? Maybe we can add your changes to the repository.
The definitions given directly in Wikipedia will be used for the live extraction (the group in Leipzig is working on that), while the definitions in the files are used to produce the dumps found on http://wiki.dbpedia.org/Downloads . mapping.xls and rules.xls were replaced by mapping.csv and rules.csv. The first version of the CSV files contained the same data as the Excel files, but going forward from there, we only updated the CSV files. They use the same "format" - the columns have the same meanings as in the Excel files. They are described in dbpedia/ontology/docs/dbpedia_mapping.txt. When you open the CSV files with OpenOffice, you will be asked for the character encoding, field separator and text delimiter used in the file. Set the character encoding to UTF-8, the separator to ";" (semicolon) and uncheck all other separators, and make sure that the text delimiter is empty (default is a quote). Similarly in Excel. When you adapt the mappings for the Slovenian Wikipedia, make sure that you only change the template URLs and template property names, but not the class names and ontology properties. The main reason for replacing the .xls file was that working with a binary format like .xls is hard. Finding the differences between different versions of such files is almost impossible, as is writing scripts that parse them. Our scripts that copy the mappings and rules to the database (dbpedia/ontology/mapping_db.php and dbpedia/ontology/rules_db.php) always worked on CSV files, which we had to export from OpenOffice or Excel first. Now we can avoid this extra step. Cheers, Christopher On Tue, Sep 22, 2009 at 00:05, Marko Burjek <[email protected]> wrote: > Hello! > > I want to use dbpedia to parse Slovenian wiki. I fixed and updated most of the > extractors, that they are more translation friendly and now I want to use > mappingBasedExtractor. The problem is that mapping.xls and rules.xls which I > wanted to use to create the ontology were deleted in revision 1441 with log > message "No longer needed". I googled and found this topic > <http://www.mail-archive.com/[email protected]/msg00870.html> > I want to know if I should use mapping.xls and rules.xls from previous > revision and create ontology with them or wait for this new way of specifying > mappings and how long would that probably be? > I also see that *.csv files in ontology folder were updated even after *.xls > files were removed but mapping.xls from last usefull revision is the same as > one I donwnloaded in june. > > Best regards, > Marko > ------------------------------------------------------------------------------ Come build with us! The BlackBerry® Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9-12, 2009. Register now! http://p.sf.net/sfu/devconf _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
