Hello everyone,

as you might have noticed we had some troubling issues with abstracts files
in general and English abstracts in particular.

We have remedied those issues by rerunning the full abstract extractions
 for the 10 languages most affected by these issues
(de,en,es,fr,it,ja,ko,nl,pl,pt).

Secondarily, we used this as an opportunity to test the the NLP Interchange
Format (NIF)
<http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/nif-core.html>extraction
on the abstracts of those languages, extraction three new datasets in the
process:


   - *nif-context*: the full text of a page as context (including begin and
   end index)
   - *nif-page-structure*: the structure of the page in sections and
   paragraphs (titles, subsections etc.)
   - *nif-text-links*: all in-text links to other DBpedia resources as well
   as external references

While for this test run we only include the first section (the abstract) of
every page in the context, we are trying (hopefully by the next release) to
extend the context to the full text of all Wikipedia pages, portraying its
structure and providing the foundation for future NLP fact extraction tasks.

You can download these files from here
<http://wiki.dbpedia.org/nif-abstract-datasets>or directly here
<http://downloads.dbpedia.org/2016-04/ext/nif-abstracts/>.

Furthermore, Magnus discovered that all Wikidata normalized files
(wkd_uris) for the English language edition had faulty predicates, so we
reproduced these as well.

We hope to have covered all shortcomings of the last release by this
measure.

Please note: Patrick from Open Link is still in the process of updating the
public endpoint of DBpedia with the new abstracts while I'm writing this
message.

Markus Freudenberg

Release Manager, DBpedia <http://wiki.dbpedia.org>
------------------------------------------------------------------------------
_______________________________________________
DBpedia-developers mailing list
DBpedia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers

Reply via email to