Hello,

 

Concerning the DBPedia citations & references challenge, we report about a 
project that aims to map the DBPedia's citations to existing bibliographical 
data. Even though the deadline for the challenge has passed we would be 
grateful for your feedback about the project.

 

More specifically, a number of properties of the 
enwiki-20160305-citation-data.ttl file have been used in order to facilitate 
the linking of the triples' subjects (found in the file) to URIs from other 
bibliographical sources. As a result, a total of 402,354 links were discovered, 
with 379,835 corresponding to distinct subjects. Emphasis has been given to the 
properties that represent identifiers, that can be found in other data sources 
and are relatively common. In particular, the properties isbn, isbn13, issn, 
doi, journal, series, periodical, magazine, oclc, pmid and arxiv have been used 
combined with the title and year. The linking of the data has been based on a 
number of LOD dumps that are available for download and bibliographical 
websites that provide their metadata through APIs. The project comprises of an 
application written in Java that processes and links the data and a triplestore 
which stores the original and the processed data. 

 

The following data sources have been used in the project:


Data source

Type

Unique triples 

in local data dump


DBPedia citations

 <http://downloads.dbpedia.org/temporary/citations/> Data dump

76.2M


 <http://dblp.uni-trier.de/> DBLP - Digital Bibliography & Library Project

 <http://dblp.l3s.de/dblp++.php> Data dump

88.1M


 <http://bnb.bl.uk/> BNB - British National Bibliography

 <http://www.bl.uk/bibliographic/download.html#lodbnb> Data dump

111M


 
<http://www.dnb.de/EN/Service/DigitaleDienste/DNBBibliografie/dnbbibliografie_node.html>
 DNB - Deutsche Nationalbibliografie

 
<http://datendienst.dnb.de/cgi-bin/mabit.pl?userID=opendata&pass=opendata&cmd=login>
 Data dump

414.2M


 <http://www.bne.es/> BNE - Biblioteca Nacional de España

 
<http://www.bne.es/en/Inicio/Perfiles/Bibliotecarios/DatosEnlazados/DescargaFicheros/>
 Data dump

68.7M


 <http://www.springer.com/> Springer

 <http://lod.springer.com/data/dumps> Data dump

3.3M


 <https://www.worldcat.org/> WorldCat

 <https://www.oclc.org/data/data-sets-services.en.html> API

2.1M


 <https://www.ncbi.nlm.nih.gov/pubmed> PubMed

 <https://www.ncbi.nlm.nih.gov/books/NBK25497/> API

0.629M


 <https://arxiv.org/> arXiv

 <https://arxiv.org/help/api/user-manual> API

0.021M

 

The enwiki-20160305-citation-data.ttl file contains 76,223,926 unique triples 
with 12,391,363 distinct subjects. The results found in the project correspond 
to 379,835 / 999,679 = 38% of the distinct subjects extracted and to 379,835 / 
12,391,363 = 3% of the entire file. 

 

The links found, are contained in the dbpedia_combined_links.nt.zip 
<https://dl.dropboxusercontent.com/s/9dm9qotlgzumcqc/dbpedia_combined_links.nt.zip>
  file and also can be queried from the following GraphDB Free SPARQL endpoint: 
http://lod.csd.auth.gr:7200/sparql

 

A more detailed report about the project can be found at:  

https://dl.dropboxusercontent.com/s/botmb4ax8d7ixug/Report_citation-challenge.pdf

 

 

Respectfully,

David Nazarian

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
DBpedia-discussion mailing list
DBpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to