Concerning the DBPedia citations & references challenge, we report about a 
project that aims to map the DBPedia's citations to existing bibliographical 
data. Even though the deadline for the challenge has passed we would be 
grateful for your feedback about the project.


More specifically, a number of properties of the 
enwiki-20160305-citation-data.ttl file have been used in order to facilitate 
the linking of the triples' subjects (found in the file) to URIs from other 
bibliographical sources. As a result, a total of 402,354 links were discovered, 
with 379,835 corresponding to distinct subjects. Emphasis has been given to the 
properties that represent identifiers, that can be found in other data sources 
and are relatively common. In particular, the properties isbn, isbn13, issn, 
doi, journal, series, periodical, magazine, oclc, pmid and arxiv have been used 
combined with the title and year. The linking of the data has been based on a 
number of LOD dumps that are available for download and bibliographical 
websites that provide their metadata through APIs. The project comprises of an 
application written in Java that processes and links the data and a triplestore 
which stores the original and the processed data. 


The following data sources have been used in the project:

Data source


Unique triples 

in local data dump

DBPedia citations

 <http://downloads.dbpedia.org/temporary/citations/> Data dump


 <http://dblp.uni-trier.de/> DBLP - Digital Bibliography & Library Project

 <http://dblp.l3s.de/dblp++.php> Data dump


 <http://bnb.bl.uk/> BNB - British National Bibliography

 <http://www.bl.uk/bibliographic/download.html#lodbnb> Data dump


 DNB - Deutsche Nationalbibliografie

 Data dump


 <http://www.bne.es/> BNE - Biblioteca Nacional de España

 Data dump


 <http://www.springer.com/> Springer

 <http://lod.springer.com/data/dumps> Data dump


 <https://www.worldcat.org/> WorldCat

 <https://www.oclc.org/data/data-sets-services.en.html> API


 <https://www.ncbi.nlm.nih.gov/pubmed> PubMed

 <https://www.ncbi.nlm.nih.gov/books/NBK25497/> API


 <https://arxiv.org/> arXiv

 <https://arxiv.org/help/api/user-manual> API



The enwiki-20160305-citation-data.ttl file contains 76,223,926 unique triples 
with 12,391,363 distinct subjects. The results found in the project correspond 
to 379,835 / 999,679 = 38% of the distinct subjects extracted and to 379,835 / 
12,391,363 = 3% of the entire file. 


The links found, are contained in the dbpedia_combined_links.nt.zip 
  file and also can be queried from the following GraphDB Free SPARQL endpoint: 


A more detailed report about the project can be found at:  





David Nazarian

Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
DBpedia-discussion mailing list

Reply via email to