Hi all,

du to your requests, we can just post the small news messages from chat and forum.dbpedia.org here as well. Although they seem very colloquial for a mailing list where normal announcements are posted. Anyhow, here is it

we stabilised the release process now. It was quite tough in the end. We moved the Generic Extraction to SPARK and it runs in 2-3 days now for 140 languages. @marvinh wrapped the jena parser into reactive streams and we are able to log all errors in the ntriples files now at a speed of 8 million triples per minute (still optimising). Then we realised some months ago that parsing is not the only criteria. In RDF you actually want the uris to be not only valid, but also exactly the same as before, e.g. `&` is allowed in uri path and in DBpedia/Wikipedia, but sometimes %26 was used. So we wrote a CI Test https://forum.dbpedia.org/t/new-ci-tests-on-dbpedia-releases/77 and an Eval Mod that parses all ntriples files that anyone loads onto the databus and annotates with the error rate. Anyhow these problems are solved and need only some tinkering/optimisations now, they are reproducible. Here are the first of their kind clean mapping extractions: https://databus.dbpedia.org/dbpedia/mappings/specific-mappingbased-properties/2019.08.01 https://databus.dbpedia.org/dbpedia/mappings/geo-coordinates-mappingbased/2019.08.01
https://databus.dbpedia.org/dbpedia/mappings/instance-types/2019.08.01
https://databus.dbpedia.org/dbpedia/mappings/mappingbased-objects-uncleaned/2019.08.01
https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals

Collection feature is working somewhat now, so we made one for the English data: https://databus.dbpedia.org/system/collection/kurzum/milestonefirst (also as `?format=json`)

we will add generic and wikidata next week and then 2019.08.01 will also finish around the 7th of each month. Bugs in short/long abstracts are fixed and they can run as well (although we do not know how long these will run). By the way, the evalmod can be seen here: https://databus.dbpedia.org/dbpedia/mappings/instance-types/2018.12.01 (scroll down until yellow and click on the image).   Fingers crossed that these will be green for https://databus.dbpedia.org/dbpedia/mappings/instance-types/2019.08.01

next week we will document all data and all features better to have everything ready forhttps://wiki.dbpedia.org/events/14th-dbpedia-community-meeting-karlsruhe

--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) Competence Center
at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt <http://www.w3.org/community/ld4lt>
Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
_______________________________________________
DBpedia-discussion mailing list
DBpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to