Hi Marko,

it's great that you're working on a Slovenian extraction! In which way
did you modify the extractors? Maybe we can add your changes to the
repository.

The definitions given directly in Wikipedia will be used for the live extraction
(the group in Leipzig is working on that), while the definitions in
the files are
used to produce the dumps found on http://wiki.dbpedia.org/Downloads .

mapping.xls and rules.xls were replaced by mapping.csv and rules.csv.
The first version of the CSV files contained the same data as the Excel files,
but going forward from there, we only updated the CSV files.

They use the same "format" - the columns have the same meanings as in
the Excel files. They are described in
dbpedia/ontology/docs/dbpedia_mapping.txt.

When you open the CSV files with OpenOffice, you will be asked for the
character encoding, field separator and text delimiter used in the file. Set the
character encoding to UTF-8, the separator to ";" (semicolon) and uncheck
all other separators, and make sure that the text delimiter is empty (default
is a quote). Similarly in Excel.

When you adapt the mappings for the Slovenian Wikipedia, make sure that
you only change the template URLs and template property names, but not
the class names and ontology properties.

The main reason for replacing the .xls file was that working with a
binary format
like .xls is hard. Finding the differences between different versions
of such files is
almost impossible, as is writing scripts that parse them. Our scripts that copy
the mappings and rules to the database (dbpedia/ontology/mapping_db.php
and dbpedia/ontology/rules_db.php) always worked on CSV files, which we
had to export from OpenOffice or Excel first. Now we can avoid this extra step.

Cheers,
Christopher

On Tue, Sep 22, 2009 at 00:05, Marko Burjek <[email protected]> wrote:
> Hello!
>
> I want to use dbpedia to parse Slovenian wiki. I fixed and updated most of the
> extractors, that they are more translation friendly and now I want to use
> mappingBasedExtractor. The problem is that mapping.xls and rules.xls which I
> wanted to use to create the ontology were deleted in revision 1441 with log
> message "No longer needed". I googled and found this topic
> <http://www.mail-archive.com/[email protected]/msg00870.html>
> I want to know if I should use mapping.xls and rules.xls from previous
> revision and create ontology with them or wait for this new way of specifying
> mappings and how long would that probably be?
> I also see that *.csv files in ontology folder were updated even after *.xls
> files were removed but mapping.xls from last usefull revision is the same as
> one I donwnloaded in june.
>
> Best regards,
> Marko
>

------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
http://p.sf.net/sfu/devconf
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to