(originally announced yesterday on dbpedia-disussion@...)
Hereby we announce the release of DBpedia 2015-10 (also known as: 2015 B).
This DBpedia release is based on updated Wikipedia dumps dating from
October 2015 featuring a significantly expanded base of information as well
as richer and (hopefully) cleaner data conforming to the DBpedia ontology.
You can download the new DBpedia datasets in RDF format from
http://wiki.dbpedia.org/Downloads2015-10 or directly here:
http://downloads.dbpedia.org/2015-10/.
Statistics
The English version of the DBpedia knowledge base currently describes 6.2M
things of which 4.6M have abstracts, 955K have geo coordinates and 1.54M
depictions. In total, 5M resources are classified in a consistent ontology
and consists of 1.6M persons, 800K places (including 500K populated
places), 480K works (including 133K music albums, 102K films and 20K video
games), 267K organizations (including 66K companies and 52K educational
institutions), 293K species and 5K diseases. The total number of resources
in English DBpedia is 16.4M that, besides the 4.6M resources with
abstracts, includes 1.3M skos concepts (categories), 7.1M redirect pages,
254K disambiguation pages and 1.6M intermediate nodes.
Altogether the DBpedia 2015-10 release consists of 8.8 billion (2015-04:
6.9 billion) pieces of information (RDF triples) out of which 1.1 billion
(2015-04: 737 million) were extracted from the English edition of
Wikipedia, 4.4 billion (2015-04: 3.8 billion) were extracted from other
language editions and 3.2 billion (2015-04: 2.4 billion) from DBpedia
Commons and Wikidata. In general we observed a significant growth in raw
infobox and mapping-based statements of close to 10%.
Thorough statistics can be found on the DBpedia website
<http://wiki.dbpedia.org/services-resources/datasets/dataset-2015-10/dataset-2015-10-statistics>
and general information on the DBpedia datasets here
<http://wiki.dbpedia.org/services-resources/datasets/dbpedia-datasets>.
Community
The DBpedia community added new classes and properties to the DBpedia
ontology via the mappings wiki. The DBpedia 2015-10 ontology encompasses
-
739 classes (DBpedia 2015-04: 735)
-
1,099 object properties (DBpedia 2015-04: 1,098)
-
1,596 datatype properties (DBpedia 2015-04: 1,583)
-
132 specialized datatype properties (DBpedia 2015-04: 132)
-
407 owl:equivalentClass and 222 owl:equivalentProperty mappings external
vocabularies (DBpedia 2015-04: 408 - 200)
The editors community of the mappings wiki also defined many new mappings
from Wikipedia templates to DBpedia classes. For the DBpedia 2015-10
extraction, we used a total of 5553 template mappings (DBpedia 2015-04:
4317 mappings). For the first time the top language, gauged by number of
mappings, is Dutch (606 mappings), surpassing the English community (600
mappings).
(Breaking) Changes
-
English DBpedia switched to IRIs from URIs. Some URIs will not resolve
and we provide the “uri-same-as-iri” dataset for English to ease the
transition. For more technical details on this issue read section 6
<http://svn.aksw.org/papers/2011/DBpedia_I18n/public.pdf> p. 19-23 (old
but still valid)
-
The instance-types dataset is now split to two files:
-
instance-types (containing only direct types)
-
Instance-types-transitive containing the transitive types of a
resource based on the DBpedia ontology
-
The mappingbased-properties file is now split in three (3) files:
-
“geo-coordinates-mappingbased” that contains the coordinated
originating from the mappings wiki. the “geo-coordinates” continues to
provide the coordinates originating from the GeoExtractor
-
“mappingbased-literals” that contains mapping based fact with literal
values
-
“mappingbased-objects” that contains mapping based fact with object
values
-
the “mappingbased-objects-disjoint-[domain|range]” are facts that are
filtered out from the “mappingbased-objects” datasets as errors but are
still provided
-
We added a new extractor for citation data
-
All datasets are available in .ttl and .tql serialization (nt, nq
dataset were neglected for reasons of redundancy and server capacity).
-
We are providing DBpedia as a Docker image.
Dockerized-DBpedia <https://github.com/dbpedia/Dockerized-DBpedia>:
Creates and runs an Virtuoso Open Source instance preloaded with the latest
DBpedia dataset inside a Docker container.
-
Starting with this release we provide extensive dataset metadata by
adding DataIDs <http://dbpedia.org/projects/dbpedia-dataid> for all
extracted languages to the respective language directories.
-
In addition we revamped the dataset table on the download-page
<http://wiki.dbpedia.org/Downloads2015-10>. It’s created dynamically
based on the DataIDs of all languages. Likewise the tables on the
statistics-page
<http://wiki.dbpedia.org/services-resources/datasets/dataset-2015-10/dataset-2015-10-statistics>
is now based on files <http://downloads.dbpedia.org/2015-10/statistics/>
providing information about all mapping languages.
-
From now on forward we also include the original Wikipedia dump files
alongside the extracted datasets (‘pages_articles.xml.bz2’).
-
A complete changelog can always be found in the git log
<https://github.com/dbpedia/extraction-framework/compare/DBpedia_2015-04...master>
Upcoming Changes
-
We are working to move away from the mappings wiki but we will have at
least one more mapping sprint.
-
We have some cool ideas <http://wiki.dbpedia.org/ideas/> for gsoc this
year. Additional mentors are more than welcome:)
Extended Type System to cover Articles without Infobox
Until the DBpedia 3.8 release, a concept was only assigned a type (like
person or place) if the corresponding Wikipedia article contains an infobox
indicating this type. Starting from the 3.9 release, we provide type
statements for articles without infobox that are inferred based on the link
structure within the DBpedia knowledge base using the algorithm described in
Paulheim/Bizer 2014 <http://www.heikopaulheim.com/documents/ijswis_2014.pdf>.
For the new release, an improved version of the algorithm was run to
produce type information for 400,000 things that were formerly not typed. A
similar algorithm (presented in the same paper) was used to identify and
remove potentially wrong statements from the knowledge base.
In addition, this release include four new type datasets, although not
included in the online sparql endpoint: 1) LHD datasets
<http://ner.vse.cz/datasets/linkedhypernyms/> for English, German and Dutch
and 2) DBTax
<http://it.dbpedia.org/2015/02/dbpedia-italiana-release-3-4-wikidata-e-dbtax/>
for English.
Both of these datasets use a typing system beyond the DBpedia ontology and
we provide a subset, mapped to the DBpedia ontology (dbo) and a full one
with all types (ext).
Credits
Lots of thanks to
-
Markus Freudenberg (University of Leipzig / DBpedia Association) for
taking over the whole release process and creating the revamped download &
statistics pages.
-
Dimitris Kontokostas (University of Leipzig / DBpedia Association) for
conveying his considerable knowledge of the extraction and release process.
-
Volha Bryl (University of Mannheim / Springer) for their work on
previous releases and their continuous support in this release.
-
All editors that contributed to the DBpedia ontology mappings via the
Mappings Wiki.
-
The whole DBpedia Internationalization Committee for pushing the DBpedia
internationalization forward.
-
Heiko Paulheim (University of Mannheim) for re-running his algorithm to
generate additional type statements for formerly untyped resources and
identify and removed wrong statements.
-
Václav Zeman and the whole LHD team (University of Prague) for their
contribution of additional DBpedia types
-
Marco Fossati (FBK) for contributing the DBTax types
-
Alan Meehan (TCD) for performing a big external link cleanup
-
Aldo Gangemi (LIPN University, France & ISTC-CNR, Italy) for providing
the links from DOLCE to DBpedia ontology.
-
Kingsley Idehen, Patrick van Kleef, and Mitko Iliev (all OpenLink
Software) for loading the new data set into the Virtuoso instance that
provides 5-Star Linked Open Data publication and SPARQL Query Services.
-
OpenLink Software (http://www.openlinksw.com/) altogether for providing
the SPARQL Query Services and Linked Open Data publishing infrastructure
for DBpedia in addition to their continuous infrastructure support.
-
Ruben Verborgh from Ghent University – iMinds for publishing the dataset
as Triple Pattern Fragments <http://fragments.dbpedia.org/>, and iMinds
for sponsoring DBpedia’s Triple Pattern Fragments server.
-
Ali Ismayilov (University of Bonn) for extending the DBpedia Wikidata
dataset.
-
Vladimir Alexiev (Ontotext) for leading a successful mapping and
ontology clean up effort.
-
All the GSoC students and mentors working directly or indirectly on the
DBpedia release
-
Special thanks to members of the DBpedia Association
<http://dbpedia.org/dbpedia-association>, the AKSW
<http://aksw.org/About.html> and the department for Business Information
Systems <http://bis.informatik.uni-leipzig.de/en/Welcome> of the
University of Leipzig.
The work on the DBpedia 2015-10 release was financially supported by the
European Commission through the project ALIGNED – quality-centric, software
and data engineering (http://aligned-project.eu/).
More information about DBpedia is found at http://dbpedia.org as well as in
the new overview article about the project available at
http://wiki.dbpedia.org/Publications.
Have fun with the new DBpedia 2015-10 release!
Cheers,
Markus Freudenberg, Dimitris Kontokostas, Sebastian Hellmann
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
_______________________________________________
Dbpedia-developers mailing list
Dbpedia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers