Hi all,
du to your requests, we can just post the small news messages from chat
and forum.dbpedia.org here as well. Although they seem very colloquial
for a mailing list where normal announcements are posted. Anyhow, here is it
we stabilised the release process now. It was quite tough in the end. We
moved the Generic Extraction to SPARK and it runs in 2-3 days now for
140 languages. @marvinh wrapped the jena parser into reactive streams
and we are able to log all errors in the ntriples files now at a speed
of 8 million triples per minute (still optimising). Then we realised
some months ago that parsing is not the only criteria. In RDF you
actually want the uris to be not only valid, but also exactly the same
as before, e.g. `&` is allowed in uri path and in DBpedia/Wikipedia, but
sometimes %26 was used. So we wrote a CI Test
https://forum.dbpedia.org/t/new-ci-tests-on-dbpedia-releases/77 and an
Eval Mod that parses all ntriples files that anyone loads onto the
databus and annotates with the error rate.
Anyhow these problems are solved and need only some
tinkering/optimisations now, they are reproducible. Here are the first
of their kind clean mapping extractions:
•
https://databus.dbpedia.org/dbpedia/mappings/specific-mappingbased-properties/2019.08.01
•
https://databus.dbpedia.org/dbpedia/mappings/geo-coordinates-mappingbased/2019.08.01
• https://databus.dbpedia.org/dbpedia/mappings/instance-types/2019.08.01
•
https://databus.dbpedia.org/dbpedia/mappings/mappingbased-objects-uncleaned/2019.08.01
• https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals
Collection feature is working somewhat now, so we made one for the
English data:
https://databus.dbpedia.org/system/collection/kurzum/milestonefirst
(also as `?format=json`)
we will add generic and wikidata next week and then 2019.08.01 will also
finish around the 7th of each month. Bugs in short/long abstracts are
fixed and they can run as well (although we do not know how long these
will run). By the way, the evalmod can be seen here:
https://databus.dbpedia.org/dbpedia/mappings/instance-types/2018.12.01
(scroll down until yellow and click on the image). Fingers crossed
that these will be green for
https://databus.dbpedia.org/dbpedia/mappings/instance-types/2019.08.01
next week we will document all data and all features better to have
everything ready
forhttps://wiki.dbpedia.org/events/14th-dbpedia-community-meeting-karlsruhe
--
All the best,
Sebastian Hellmann
Director of Knowledge Integration and Linked Data Technologies (KILT)
Competence Center
at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org,
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt
<http://www.w3.org/community/ld4lt>
Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
_______________________________________________
DBpedia-discussion mailing list
DBpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion