On Jun 5, 2015, at 8:20 AM, Ethan Gruber <[email protected]> wrote:

>> Does anybody here have experience reading the SGML/XML files representing
>> the content of EEBO?
> 
> Are these in TEI? Back when I worked for the University of Virginia
> Library, I did a lot of clean up work and migration of Chadwyck-Healey
> stuff into TEI-P4 compliant XML (thousands of files), but unfortunately all
> of the Perl scripts to migrate old garbage SGML into XML are probably gone.
> 
> How many of these things are really worth keeping, i.e., were not digitized
> by any other organization that has freely published them online?


The data I have comes in two flavors: 1) some flavor of SGML, and 2) some 
flavor of XML which is TEI-like, but not TEI. All of the files are worth 
keeping because I get the basic bibliographic information (id, author, title, 
date, keywords/subjects), as well as transcribed text. (No images.) Given such 
data, I think I can provide interesting, cool, and “kewl” services. Given the 
id number, I may then be able to link to the scanned image. Wish me luck. —ELM

Reply via email to