[DBpedia-discussion] ANN: DBpedia 2016-10-beta release

Markus Freudenberg Fri, 28 Apr 2017 07:49:11 -0700

Dear all,

Another DBpedia release is dawning and soon will be published in full.


In the meantime, we will forward the main body of data for the coming
2016-10 release:

http://downloads.dbpedia.org/2016-1 <http://downloads.dbpedia.org/2016-04/>
0/

This release-cycle took somewhat longer than the last ones for multiple
reasons:


   1.

   We started extracting the full texts of each wiki page in addition to
   the abstracts in the NLP Interchange Format (NIF
   
<http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/nif-core.html>),
   providing the readable text structured in sections and paragraphs and all
   in-text links (see here
   <http://downloads.dbpedia.org/2016-04/ext/nif-abstracts/>: careful the
   linked datasets on this page represents data of the last release). The
   additional (nif-) datasets will double the released data dumps in size.
   2.

   We are preparing a major overhaul of the data extraction procedure based
   on SPARK <http://spark.apache.org>, in cooperation with the Semantic Web
   Company <https://www.semantic-web.at>, which necessitates extended
   refactoring of the current code-base.
   3.

   We focused on actively gathering incoming links of other datasets to
   return the favour by turning them around as outgoing links. This is an
   ongoing process, which will update the links on a monthly basis.
   <http://downloads.dbpedia.org/links/2017-04-01/dbpedia.org/>


Please have a closer look at the current status of the data, so we can
catch missing or odd data points before publishing the data.

What is still missing:

   -

   Additional types (SDTypes, Hypernyms, DBTax)
   -

   Additional datasets for DBpedia+
   -

   Release statistics
   -

   download page
   -

   No public endpoint yet with the data


In case you missed the changes in the last release (2016-04):

   -

   In addition to normalized datasets to English DBpedia (en-uris) we
   additionally provide normalized datasets based on the DBpedia Wikidata
   (DBw) datasets (wkd-uris). These sorted datasets will be the foundation for
   the upcoming fusion process with wikidata. The DBw-based uris will be the
   only ones provided from the following releases on.
   -

   We now filter out triples from the Raw Infobox Extractor that are
   already mapped. E.g. no more “<x> dbo:birthPlace <z>” and “<x>
   dbp:birthPlace|dbp:placeOfBirth|... <z>” in the same resource. These
   triples are now moved to the “infobox-properties-mapped” datasets and not
   loaded on the main endpoint. See issue 22
   <https://github.com/dbpedia/extraction-framework/issues/22> for more
   details.
   -

   Major improvements in our citation extraction. See here
   
<http://www.mail-archive.com/[email protected]/msg07762.html>
   for more details.



Markus, on behalf of the DBpedia extraction team.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

_______________________________________________
DBpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

[DBpedia-discussion] ANN: DBpedia 2016-10-beta release

Reply via email to