Alistair Miles
Mon, 16 Feb 2009 01:13:44 -0800
Hi Karen, On Fri, Feb 13, 2009 at 06:46:37AM -0800, Karen Coyle wrote: > Alistair, > > I did start an analysis of RDA and MARC, but didn't get very far. I'll > take that up again. What I was mainly finding is that there are a lot of > RDA elements that are listed for more than one MARC element, e.g. > > $a Personal name* = 9.2.2 Preferred Name for the Person* > $b Numeration = *9.2.2 Preferred Name for the Person Yes, I expect there will be lots of issues like this, in both directions. Please do continue your analysis, this type if insight is very useful. I should say that I don't hope to create either a complete or perfect mapping from mods to RDF/RDA/FRBR. Rather I hope to map just enough to capture a significant amount of useful information, to demonstrate the potential for further work in this direction. Cheers, Alistair > > There are ones that go the other way, as well, where RDA is more > specific than MARC. It made me wonder how it is that we use the specific > MARC elements: are they needed for display? do they help input? are they > arbitrary? > > I haven't looked at MODS, however, and there isn't a mapping provided > between MODS and RDA. I'll think about that, however. > > kc > > *Alistair Miles wrote: >> Hi all, >> >> This is just an update to say that I've converted the LOC/scriblio >> data to marc xml and from there to mods xml. My next step is to do >> some analysis of the loc data in mods xml to get an overview of the >> elements used, then to try to design at least a partial mapping from >> mods xml to RDF using the RDA and FRBR schemas. >> >> FYI the marc xml and mods xml versions of the LOC/scriblio data can be >> downloaded from the links below... >> >> http://dcmi-rda.s3.amazonaws.com/locdata/part01-marcxml.tar.gz >> http://dcmi-rda.s3.amazonaws.com/locdata/part01-modsxml.tar.gz >> http://dcmi-rda.s3.amazonaws.com/locdata/part02-marcxml.tar.gz >> http://dcmi-rda.s3.amazonaws.com/locdata/part02-modsxml.tar.gz >> [...] >> http://dcmi-rda.s3.amazonaws.com/locdata/part29-marcxml.tar.gz >> http://dcmi-rda.s3.amazonaws.com/locdata/part29-modsxml.tar.gz >> >> Each download is a gzipped tar containing a *set* of up to 25 xml >> files. Each of these files is a 10,000 record split of the data in the >> corresponding part. I broke each part into 10,000 record splits so I >> could process the transformations more easily. >> >> N.B. there is a bug in part 13 split 25, for some reason the marc xml >> output was incomplete so up to 10,000 records could be missing. >> >> FWIW I initially tried the conversions without splitting each >> part. I.e. I converted each original marc file into a single marc xml >> file, then tried to transform that to a mods xml file via >> xsltproc. However I found you need more than 7GB ram to do the marcxml >> to modsxml transform on a whole part (I tried it on a large ec2 >> instance), so that's when I decided to split each part into smaller >> chunks, which I figured would be faster to process and more amenable >> to parallel processing (transforming all the splits from marcxml to >> modsxml took a couple of hours on a c1.xlarge ec2 instance, running up >> to 10 transformations in parallel; it can also be done on a laptop, >> but takes ~10 times longer). >> >> Btw if anyone else has experience of the marcxml->modsxml transform on >> a file of similar size do let me know, I don't do a lot of xslt-ing so >> may be missing some tricks for making it work on smaller computers. >> >> Cheers, >> >> Alistair >> >> >> On Mon, Dec 22, 2008 at 03:31:50PM -0500, Ed Summers wrote: >> >>> Hey Alistair: >>> >>> On Mon, Dec 22, 2008 at 1:16 PM, Alistair Miles >>> <alistair.mi...@zoo.ox.ac.uk> wrote: >>> >>>> Any tips for how I could turn these data into RDF? >>>> >>> If you want to work specifically with that dataset you could download >>> the different parts Karen pointed you to, and convert to MARCXML using >>> an efficient tool like yaz-marcdump [2]. yaz-marcdump is nice it will >>> convert from MARC-8 to UTF-8. >>> >>> Once you've got it in MARCXML you could then use a stylesheet like >>> LC's [2] to convert to DublinCore flavored RDF. This might be kinda >>> lossy for your RDA work though, so you might want MARCXML->MODS [3], >>> and then use the MODS->RDF conversion that the Simile folks created >>> (which Karen also pointed you to) [4]. >>> >>> In fact Simile used that stylesheet on their own MIT Library Catalog >>> MARC data (Barton) and still seem to have the result online [5]. So >>> perhaps just using the Barton data is the quickest way to begin >>> playing with what once was MARC data as RDF? To my knowledge Stefano >>> Mazzocchi simply created an RDF vocabulary that mirrors the MODS XML >>> Schema, but I haven't looked at it in a while. >>> >>> Another thing worth checking out might be Rob Styles work [6] with >>> other people at Talis at converting MARC with full fidelity to RDF. >>> Perhaps he has some tools (or data) at his disposal? Rob you are on >>> here right? >>> >>> I'd be willing to lend a hand with some of this if necessary, so just >>> let me know if you think I can help. >>> >>> //Ed >>> >>> [1] http://www.indexdata.com/yaz/doc/yaz-marcdump.tkl >>> [2] http://www.loc.gov/standards/marcxml/xslt/MARC21slim2RDFDC.xsl >>> [3] http://www.loc.gov/standards/mods/v3/MARC21slim2MODS3.xsl >>> [4] http://simile.mit.edu/wiki/MARC/MODS_RDFizer >>> [5] http://simile.mit.edu/wiki/Dataset:_Barton >>> [6] >>> http://events.linkeddata.org/ldow2008/papers/02-styles-ayers-semantic-marc.pdf >>> >> >> > > -- > ----------------------------------- > Karen Coyle / Digital Library Consultant > kco...@kcoyle.net http://www.kcoyle.net > ph.: 510-540-7596 skype: kcoylenet > fx.: 510-848-3913 > mo.: 510-435-8234 > ------------------------------------ -- Alistair Miles Senior Computing Officer Image Bioinformatics Research Group Department of Zoology The Tinbergen Building University of Oxford South Parks Road Oxford OX1 3PS United Kingdom Web: http://purl.org/net/aliman Email: alistair.mi...@zoo.ox.ac.uk Tel: +44 (0)1865 281993