[CODE4LIB] rdf and doi's

Eric Lease Morgan Tue, 15 Jan 2019 06:38:50 -0800

How might I exploit & learn from a set of RDF files harvested from DOI's?


For a good time, I have written a suite of software to harvest bibliographic 
data from Web of Science, cache the results, and report on the whole. [1] Along 
the way I programmatically collect DOI's and then resolve them. The results 
include RDF streams. ("Thanks, Kevin Ford!") For example:

  curl -i -L -H "Accept: application/rdf+xml" 
http://dx.doi.org/10.3352/jeehp.2013.10.3

And:

  <rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";
    xmlns:j.0="http://purl.org/dc/terms/";
    xmlns:j.1="http://prismstandard.org/namespaces/basic/2.1/";
    xmlns:owl="http://www.w3.org/2002/07/owl#";
    xmlns:j.2="http://purl.org/ontology/bibo/";
    xmlns:j.3="http://xmlns.com/foaf/0.1/";>
  <rdf:Description rdf:about="http://dx.doi.org/10.3352/jeehp.2013.10.3";>
    <j.0:isPartOf>
    <j.2:Journal rdf:about="http://id.crossref.org/issn/1975-5937";>
      <owl:sameAs>urn:issn:1975-5937</owl:sameAs>
      <j.0:title>Journal of Educational Evaluation for Health 
Professions</j.0:title>
      <j.1:issn>1975-5937</j.1:issn>
      <j.2:issn>1975-5937</j.2:issn>
    </j.2:Journal>
    </j.0:isPartOf>
    <j.0:creator>
    <j.3:Person 
rdf:about="http://id.crossref.org/contributor/sun-huh-112veziy3vi1o";>
      <j.3:name>Sun Huh</j.3:name>
      <j.3:familyName>Huh</j.3:familyName>
      <j.3:givenName>Sun</j.3:givenName>
    </j.3:Person>
    </j.0:creator>
    <j.0:title>Revision of the instructions to authors to require... 
</j.0:title>
    <j.1:doi>10.3352/jeehp.2013.10.3</j.1:doi>
    <j.0:date rdf:datatype="http://www.w3.org/2001/XMLSchema#date";
    >2013-04-30</j.0:date>
    <owl:sameAs rdf:resource="info:doi/10.3352/jeehp.2013.10.3"/>
    <j.0:identifier>10.3352/jeehp.2013.10.3</j.0:identifier>
    <j.2:volume>10</j.2:volume>
    <j.2:pageStart>3</j.2:pageStart>
    <j.1:startingPage>3</j.1:startingPage>
    <j.0:publisher>XMLArchive</j.0:publisher>
    <owl:sameAs rdf:resource="doi:10.3352/jeehp.2013.10.3"/>
    <j.1:volume>10</j.1:volume>
    <j.2:doi>10.3352/jeehp.2013.10.3</j.2:doi>
  </rdf:Description>
  </rdf:RDF>


That's a pretty rich RDF stream! [2]

As of right now, I have about 8000 of these streams representing publications 
of faculty here at my university. I can easily get 10's of thousands more. How 
might I take advantage of this data? How can I go beyond parsing the RDF with 
XPath, stuffing the results into a database, and applying SQL to the result? 
How can truly exploit the nature of the RDF and possibly manifest it as linked 
data? 

To answer my own question, I might put the data into a triple store, and then 
try to answer questions such as: what authors are central, what journals are 
central, what authors are "related" to whom, etc. 

What do you think?

[1] https://github.com/ericleasemorgan/api-taskforce
[2] And this rich data does not even take into account the cool, sometimes full 
text URLs/URIs found in the HTTP link header!

-- 
Eric Lease Morgan
Digital Initiatives Librarian, Navari Family Center for Digital Scholarship
Hesburgh Libraries

University of Notre Dame
250E Hesburgh Library
Notre Dame, IN 46556
o: 574-631-8604
e: [email protected]
w: cds.library.nd.edu

[CODE4LIB] rdf and doi's

Reply via email to