Hi Hady,
This might be what we were waiting for :)
If noone else objects, can you create a turtle dump and re-test / adapt
your existing ILL code?
Afterwards we can start the mappings process
Best,
Dimitris
---------- Forwarded message ----------
From: Markus Krötzsch <[email protected]>
Date: Sat, Aug 3, 2013 at 4:48 PM
Subject: [Wikidata-l] Wikidata RDF export available
To: "Discussion list for the Wikidata project." <
[email protected]>
Hi,
I am happy to report that an initial, yet fully functional RDF export for
Wikidata is now available. The exports can be created using the
wda-export-data.py script of the wda toolkit [1]. This script downloads
recent Wikidata database dumps and processes them to create RDF/Turtle
files. Various options are available to customize the output (e.g., to
export statements but not references, or to export only texts in English
and Wolof). The file creation takes a few (about three) hours on my machine
depending on what exactly is exported.
For your convenience, I have created some example exports based on
yesterday's dumps. These can be found at [2]. There are three Turtle files:
site links only, labels/descriptions/aliases only, statements only. The
fourth file is a preliminary version of the Wikibase ontology that is used
in the exports.
The export format is based on our earlier proposal [3], but it adds a lot
of details that had not been specified there yet (namespaces, references,
ID generation, compound datavalue encoding, etc.). Details might still
change, of course. We might provide regular dumps at another location once
the format is stable.
As a side effect of these activities, the wda toolkit [1] is also getting
more convenient to use. Creating code for exporting the data into other
formats is quite easy.
Features and known limitations of the wda RDF export:
(1) All current Wikidata datatypes are supported. Commons-media data is
correctly exported as URLs (not as strings).
(2) One-pass processing. Dumps are processed only once, even though this
means that we may not know the types of all properties when we first need
them: the script queries wikidata.org to find missing information. This is
only relevant when exporting statements.
(3) Limited language support. The script uses Wikidata's internal language
codes for string literals in RDF. In some cases, this might not be correct.
It would be great if somebody could create a mapping from Wikidata language
codes to BCP47 language codes (let me know if you think you can do this,
and I'll tell you where to put it)
(4) Limited site language support. To specify the language of linked wiki
sites, the script extracts a language code from the URL of the site. Again,
this might not be correct in all cases, and it would be great if somebody
had a proper mapping from Wikipedias/Wikivoyages to language codes.
(5) Some data excluded. Data that cannot currently be edited is not
exported, even if it is found in the dumps. Examples include statement
ranks and timezones for time datavalues. I also currently exclude labels
and descriptions for simple English, formal German, and informal Dutch,
since these would pollute the label space for English, German, and Dutch
without adding much benefit (other than possibly for simple English
descriptions, I cannot see any case where these languages should ever have
different Wikidata texts at all).
Feedback is welcome.
Cheers,
Markus
[1] https://github.com/mkroetzsch/wda
Run "python wda-export.data.py --help" for usage instructions
[2] http://semanticweb.org/RDF/Wikidata/
[3] http://meta.wikimedia.org/wiki/Wikidata/Development/RDF
--
Markus Kroetzsch, Departmental Lecturer
Department of Computer Science, University of Oxford
Room 306, Parks Road, OX1 3QD Oxford, United Kingdom
+44 (0)1865 283529 http://korrekt.org/
_______________________________________________
Wikidata-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-l
--
Kontokostas Dimitris
------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent
caught up. So what steps can you take to put your SQL databases under
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
_______________________________________________
Dbpedia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers