Hello everyone, as you might have noticed we had some troubling issues with abstracts files in general and English abstracts in particular.
We have remedied those issues by rerunning the full abstract extractions for the 10 languages most affected by these issues (de,en,es,fr,it,ja,ko,nl,pl,pt). Secondarily, we used this as an opportunity to test the the NLP Interchange Format (NIF) <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/nif-core.html>extraction on the abstracts of those languages, extraction three new datasets in the process: - *nif-context*: the full text of a page as context (including begin and end index) - *nif-page-structure*: the structure of the page in sections and paragraphs (titles, subsections etc.) - *nif-text-links*: all in-text links to other DBpedia resources as well as external references While for this test run we only include the first section (the abstract) of every page in the context, we are trying (hopefully by the next release) to extend the context to the full text of all Wikipedia pages, portraying its structure and providing the foundation for future NLP fact extraction tasks. You can download these files from here <http://wiki.dbpedia.org/nif-abstract-datasets>or directly here <http://downloads.dbpedia.org/2016-04/ext/nif-abstracts/>. Furthermore, Magnus discovered that all Wikidata normalized files (wkd_uris) for the English language edition had faulty predicates, so we reproduced these as well. We hope to have covered all shortcomings of the last release by this measure. Please note: Patrick from Open Link is still in the process of updating the public endpoint of DBpedia with the new abstracts while I'm writing this message. Markus Freudenberg Release Manager, DBpedia <http://wiki.dbpedia.org>
------------------------------------------------------------------------------
_______________________________________________ DBpedia-developers mailing list DBpedia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-developers