[CODE4LIB] Using dbpedia to generate EAC-CPF collections

2012-10-03 Thread Ethan Gruber
Hi all,

In the last few weeks, I have undertaken a project of EAC-CPF stubs using
dbpedia and VIAF data for the Roman emperors and their relations.  There's
a lot of great information available through dbpedia, and since it's
available in RDF, I put together a PHP script that can start at one point
in dbpedia (e.g., http://dbpedia.org/resource/Augustus) and traverse
through its relations to create a network of stubs using links to parents,
children, spouses, influences, successors, and predecessors provided in the
RDF.  Left unchecked, the script would crawl forward through the Byzantine
period to spread laterally (chronologically speaking) to generate a network
of the ruling hierarchy of the West up to the modern period.  It also goes
backwards to the successors of Alexander the Great.  For all I know, it
goes back through all of the Egyptian dynasties to Narmer ca. 3000 BC, but
I haven't let the script go that far.

The script is fairly generalizable, and can begin at any dbpedia resource.
It's available at
https://github.com/ewg118/xEAC/blob/master/misc/dbpedia-to-eac.php

I should also note that this is a work in progress.  To execute the script,
you'll need to place a temp folder in the same place you download/execute
it (for writing EAC records).

At a glance, here's what it does:

-Creates nameEntries for all of the names available in various languages in
dbpedia
-If a VIAF ID is available in the RDF, the script will pull some alternate
record IDs from VIAF, as well as birth and death dates
-Can pull in subjects, occupations, and related resources on the web
-Generate corporate/personal/family relations given the
parents/children/spouses/influences/successors/predecessors/dynasties
linked in dbpedia.  These relations are added into an array which
continually processes until presumably it reaches the end of time.
-You can specify an end record to attempt to break this chain, but I
cannot guarantee that it'll work.  Anastasius (emperor of Rome ca. 500 AD)
does actually successfully terminate the Augustus chain.
-Import birth and death places (and associated birth and death dates, if
available)

I think that these stubs are a good starting point for handing off the
management of EAC content to subject specialists who can add chronological
and geographical context.  I wrote a bit more about this script and the
process applied to xEAC, an XForms-based engine for creating, editing,
managing, and publishing EAC-CPF collections at
http://eaditor.blogspot.com/2012/10/using-dbpedia-to-jumpstart-eac-cpf.html

There's a prototype collection of the Roman Empire; if anyone is interested
in taking a look at it, drop me a line off the list.

Ethan


Re: [CODE4LIB] Using dbpedia to generate EAC-CPF collections

2012-10-03 Thread Michele R Combs
Wow.  That's pretty spiff!  I'd love to see your Roman Empire SNAC, can you 
send me the info?

Michele

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ethan 
Gruber
Sent: Wednesday, October 03, 2012 11:04 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] Using dbpedia to generate EAC-CPF collections

Hi all,

In the last few weeks, I have undertaken a project of EAC-CPF stubs using 
dbpedia and VIAF data for the Roman emperors and their relations.  There's a 
lot of great information available through dbpedia, and since it's available in 
RDF, I put together a PHP script that can start at one point in dbpedia (e.g., 
http://dbpedia.org/resource/Augustus) and traverse through its relations to 
create a network of stubs using links to parents, children, spouses, 
influences, successors, and predecessors provided in the RDF.  Left unchecked, 
the script would crawl forward through the Byzantine period to spread laterally 
(chronologically speaking) to generate a network of the ruling hierarchy of the 
West up to the modern period.  It also goes backwards to the successors of 
Alexander the Great.  For all I know, it goes back through all of the Egyptian 
dynasties to Narmer ca. 3000 BC, but I haven't let the script go that far.

The script is fairly generalizable, and can begin at any dbpedia resource.
It's available at
https://github.com/ewg118/xEAC/blob/master/misc/dbpedia-to-eac.php

I should also note that this is a work in progress.  To execute the script, 
you'll need to place a temp folder in the same place you download/execute it 
(for writing EAC records).

At a glance, here's what it does:

-Creates nameEntries for all of the names available in various languages in 
dbpedia -If a VIAF ID is available in the RDF, the script will pull some 
alternate record IDs from VIAF, as well as birth and death dates -Can pull in 
subjects, occupations, and related resources on the web -Generate 
corporate/personal/family relations given the 
parents/children/spouses/influences/successors/predecessors/dynasties
linked in dbpedia.  These relations are added into an array which continually 
processes until presumably it reaches the end of time.
-You can specify an end record to attempt to break this chain, but I cannot 
guarantee that it'll work.  Anastasius (emperor of Rome ca. 500 AD) does 
actually successfully terminate the Augustus chain.
-Import birth and death places (and associated birth and death dates, if
available)

I think that these stubs are a good starting point for handing off the 
management of EAC content to subject specialists who can add chronological and 
geographical context.  I wrote a bit more about this script and the process 
applied to xEAC, an XForms-based engine for creating, editing, managing, and 
publishing EAC-CPF collections at 
http://eaditor.blogspot.com/2012/10/using-dbpedia-to-jumpstart-eac-cpf.html

There's a prototype collection of the Roman Empire; if anyone is interested in 
taking a look at it, drop me a line off the list.

Ethan