Dear all,
Another DBpedia release is dawning and soon will be published in full.
In the meantime, we will forward the main body of data for the coming
2016-10 release:
http://downloads.dbpedia.org/2016-1 <http://downloads.dbpedia.org/2016-04/>
0/
This release-cycle took somewhat longer than the last ones for multiple
reasons:
1.
We started extracting the full texts of each wiki page in addition to
the abstracts in the NLP Interchange Format (NIF
<http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/nif-core.html>),
providing the readable text structured in sections and paragraphs and all
in-text links (see here
<http://downloads.dbpedia.org/2016-04/ext/nif-abstracts/>: careful the
linked datasets on this page represents data of the last release). The
additional (nif-) datasets will double the released data dumps in size.
2.
We are preparing a major overhaul of the data extraction procedure based
on SPARK <http://spark.apache.org>, in cooperation with the Semantic Web
Company <https://www.semantic-web.at>, which necessitates extended
refactoring of the current code-base.
3.
We focused on actively gathering incoming links of other datasets to
return the favour by turning them around as outgoing links. This is an
ongoing process, which will update the links on a monthly basis.
<http://downloads.dbpedia.org/links/2017-04-01/dbpedia.org/>
Please have a closer look at the current status of the data, so we can
catch missing or odd data points before publishing the data.
What is still missing:
-
Additional types (SDTypes, Hypernyms, DBTax)
-
Additional datasets for DBpedia+
-
Release statistics
-
download page
-
No public endpoint yet with the data
In case you missed the changes in the last release (2016-04):
-
In addition to normalized datasets to English DBpedia (en-uris) we
additionally provide normalized datasets based on the DBpedia Wikidata
(DBw) datasets (wkd-uris). These sorted datasets will be the foundation for
the upcoming fusion process with wikidata. The DBw-based uris will be the
only ones provided from the following releases on.
-
We now filter out triples from the Raw Infobox Extractor that are
already mapped. E.g. no more “<x> dbo:birthPlace <z>” and “<x>
dbp:birthPlace|dbp:placeOfBirth|... <z>” in the same resource. These
triples are now moved to the “infobox-properties-mapped” datasets and not
loaded on the main endpoint. See issue 22
<https://github.com/dbpedia/extraction-framework/issues/22> for more
details.
-
Major improvements in our citation extraction. See here
<http://www.mail-archive.com/dbpedia-discussion@lists.sourceforge.net/msg07762.html>
for more details.
Markus, on behalf of the DBpedia extraction team.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
DBpedia-discussion mailing list
DBpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion