Re: [CODE4LIB] Models of MARC in RDF

Owen Stephens Wed, 07 Dec 2011 01:13:36 -0800

Fair point. Just instinct on my part that putting it in a triple is a bit ugly 
:)


It probably doesn't make any difference, although I don't think storing in a 
triple ensures that it sticks to the object (you could store the triple 
anywhere as well)

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: [email protected]
Telephone: 0121 288 6936

On 6 Dec 2011, at 22:43, Fleming, Declan wrote:

> Hi - point at it where?  We could point back to the library catalog that we 
> harvested in the MARC to MODS to RDF process, but what if that goes away?  
> Why not write ourselves a 1K insurance policy that sticks with the object for 
> its life?
> 
> D
> 
> -----Original Message-----
> From: Code for Libraries [mailto:[email protected]] On Behalf Of Owen 
> Stephens
> Sent: Tuesday, December 06, 2011 8:06 AM
> To: [email protected]
> Subject: Re: [CODE4LIB] Models of MARC in RDF
> 
> I'd suggest that rather than shove it in a triple it might be better to point 
> at alternative representations, including MARC if desirable (keep meaning to 
> blog some thoughts about progressively enhanced metadata...)
> 
> Owen
> 
> Owen Stephens
> Owen Stephens Consulting
> Web: http://www.ostephens.com
> Email: [email protected]
> Telephone: 0121 288 6936
> 
> On 6 Dec 2011, at 15:44, Karen Coyle wrote:
> 
>> Quoting "Fleming, Declan" <[email protected]>:
>> 
>>> Hi - I'll note that the mapping decisions were made by our metadata 
>>> services (then Cataloging) group, not by the tech folks making it all 
>>> work, though we were all involved in the discussions.  One idea that 
>>> came up was to do a, perhaps, lossy translation, but also stuff one 
>>> triple with a text dump of the whole MARC record just in case we 
>>> needed to grab some other element out we might need.  We didn't do 
>>> that, but I still like the idea.  Ok, it was my idea.  ;)
>> 
>> I like that idea! Now that "disk space" is no longer an issue, it makes good 
>> sense to keep around the "original state" of any data that you transform, 
>> just in case you change your mind. I hadn't thought about incorporating the 
>> entire MARC record string in the transformation, but as I recall the average 
>> size of a MARC record is somewhere around 1K, which really isn't all that 
>> much by today's standards.
>> 
>> (As an old-timer, I remember running the entire Univ. of California 
>> union catalog on 35 megabytes, something that would now be considered 
>> a smallish email attachment.)
>> 
>> kc
>> 
>>> 
>>> D
>>> 
>>> -----Original Message-----
>>> From: Code for Libraries [mailto:[email protected]] On Behalf 
>>> Of Esme Cowles
>>> Sent: Monday, December 05, 2011 11:22 AM
>>> To: [email protected]
>>> Subject: Re: [CODE4LIB] Models of MARC in RDF
>>> 
>>> I looked into this a little more closely, and it turns out it's a little 
>>> more complicated than I remembered.  We built support for transforming to 
>>> MODS using the MODS21slim2MODS.xsl stylesheet, but don't use that.  
>>> Instead, we use custom Java code to do the mapping.
>>> 
>>> I don't have a lot of public examples, but there's at least one public 
>>> object which you can view the MARC from our OPAC:
>>> 
>>> http://roger.ucsd.edu/search/.b4827884/.b4827884/1,1,1,B/detlmarc~123
>>> 4567&FF=&1,0,
>>> 
>>> The public display in our digital collections site:
>>> 
>>> http://libraries.ucsd.edu/ark:/20775/bb0648473d
>>> 
>>> The RDF for the MODS looks like:
>>> 
>>>       <mods:classification rdf:parseType="Resource">
>>>           <mods:authority>local</mods:authority>
>>>           <rdf:value>FVLP 222-1</rdf:value>
>>>       </mods:classification>
>>>       <mods:identifier rdf:parseType="Resource">
>>>           <mods:type>ARK</mods:type>
>>>           
>>> <rdf:value>http://libraries.ucsd.edu/ark:/20775/bb0648473d</rdf:value>
>>>       </mods:identifier>
>>>       <mods:name rdf:parseType="Resource">
>>>           <mods:namePart>Brown, Victor W</mods:namePart>
>>>           <mods:type>personal</mods:type>
>>>       </mods:name>
>>>       <mods:name rdf:parseType="Resource">
>>>           <mods:namePart>Amateur Film Club of San Diego</mods:namePart>
>>>           <mods:type>corporate</mods:type>
>>>       </mods:name>
>>>       <mods:originInfo rdf:parseType="Resource">
>>>           <mods:dateCreated>[196-]</mods:dateCreated>
>>>       </mods:originInfo>
>>>       <mods:originInfo rdf:parseType="Resource">
>>>           <mods:dateIssued>2005</mods:dateIssued>
>>>           <mods:publisher>Film and Video Library, University of California, 
>>> San Diego, La Jolla, CA 92093-0175 
>>> http://orpheus.ucsd.edu/fvl/FVLPAGE.HTM</mods:publisher>
>>>       </mods:originInfo>
>>>       <mods:physicalDescription rdf:parseType="Resource">
>>>           <mods:digitalOrigin>reformatted digital</mods:digitalOrigin>
>>>           <mods:note>16mm; 1 film reel (25 min.) :; sd., col. ;</mods:note>
>>>       </mods:physicalDescription>
>>>       <mods:subject rdf:parseType="Resource">
>>>           <mods:authority>lcsh</mods:authority>
>>>           <mods:topic>Ranching</mods:topic>
>>>       </mods:subject>
>>> 
>>> etc.
>>> 
>>> 
>>> There is definitely some loss in the conversion process -- I don't know 
>>> enough about the MARC leader and control fields to know if they are 
>>> captured in the MODS and/or RDF in some way.  But there are quite a few 
>>> local and note fields that aren't present in the RDF.  Other fields (e.g. 
>>> 300 and 505) are mapped to MODS, but not displayed in our access system 
>>> (though they are indexed for searching).
>>> 
>>> I agree it's hard to quantify lossy-ness.  Counting fields or characters 
>>> would be the most objective, but has obvious problems with control 
>>> characters sometimes containing a lot of information, and then the relative 
>>> importance of different fields to the overall description.  There are other 
>>> issues too -- some fields in this record weren't migrated because they 
>>> duplicated collection-wide values, which are formulated slightly 
>>> differently from the MARC record.  Some fields weren't migrated because 
>>> they concern the physical object, and therefore don't really apply to the 
>>> digital object.  So that really seems like a morass to me.
>>> 
>>> -Esme
>>> --
>>> Esme Cowles <[email protected]>
>>> 
>>> "Necessity is the plea for every infringement of human freedom. It is 
>>> the  argument of tyrants; it is the creed of slaves." -- William 
>>> Pitt, 1783
>>> 
>>> On 12/3/2011, at 10:35 AM, Karen Coyle wrote:
>>> 
>>>> Esme, let me second Owen's enthusiasm for more detail if you can 
>>>> supply it. I think we also need to start putting these efforts along 
>>>> a "loss" continuum - MODS is already lossy vis-a-vis MARC, and my 
>>>> guess is that some of the other MARC->RDF transforms don't include 
>>>> all of the warts and wrinkles of MARC. LC's new bibliographic 
>>>> framework document sets as a goal to bring along ALL of MARC (a 
>>>> decision that I think isn't obvious, as we have already discussed 
>>>> here). If we say we are going from MARC to RDF, how much is actually 
>>>> captured in the transformed data set? (Yes, that's going to be hard 
>>>> to quantify.)
>>>> 
>>>> kc
>>>> 
>>>> Quoting Esme Cowles <[email protected]>:
>>>> 
>>>>> Owen-
>>>>> 
>>>>> Another strategy for capturing MARC data in RDF is to convert it to MODS 
>>>>> (we do this using the LoC MARC to MODS stylesheet: 
>>>>> http://www.loc.gov/standards/marcxml/xslt/MARC21slim2MODS.xsl).  From 
>>>>> there, it's pretty easy to incorporate into RDF.  There are some issues 
>>>>> to be aware of, such as how to map the MODS XML names to predicates and 
>>>>> how to handle elements that can appear in multiple places in the 
>>>>> hierarchy.
>>>>> 
>>>>> -Esme
>>>>> --
>>>>> Esme Cowles <[email protected]>
>>>>> 
>>>>> "Necessity is the plea for every infringement of human freedom. It 
>>>>> is the argument of tyrants; it is the creed of slaves." -- William 
>>>>> Pitt,
>>>>> 1783
>>>>> 
>>>>> On 11/28/2011, at 8:25 AM, Owen Stephens wrote:
>>>>> 
>>>>>> It would be great to start collecting transforms together - just a 
>>>>>> quick brain dump of some I'm aware of
>>>>>> 
>>>>>> MARC21 transformations
>>>>>> Cambridge University Library - http://data.lib.cam.ac.uk - 
>>>>>> transformation made available (in code) from same site Open 
>>>>>> University - http://data.open.ac.uk - specific transform for 
>>>>>> materials related to teaching, code available at 
>>>>>> http://code.google.com/p/luceroproject/source/browse/trunk%20lucer
>>>>>> op 
>>>>>> roject/OULinkedData/src/uk/ac/open/kmi/lucero/rdfextractor/RDFExtr
>>>>>> ac tor.java (MARC transform is in libraryRDFExtraction method) 
>>>>>> COPAC - small set of records from the COPAC Union catalogue - data 
>>>>>> and transform not yet published Podes Projekt - LinkedAuthors - 
>>>>>> documentation at 
>>>>>> http://bibpode.no/linkedauthors/doc/Pode-LinkedAuthors-Documentati
>>>>>> on .pdf - 2 stage transformation firstly from MARC to FRBRized 
>>>>>> version of data, then from FRBRized data to RDF. These linked from 
>>>>>> documentation Podes Project - LinkedNonFiction - documentation at 
>>>>>> http://bibpode.no/linkednonfiction/doc/Pode-LinkedNonFiction-Docum
>>>>>> en tation.pdf - MARC data transformed using xslt 
>>>>>> https://github.com/pode/LinkedNonFiction/blob/master/marcslim2n3.x
>>>>>> sl
>>>>>> 
>>>>>> British Library British National Bibliography - 
>>>>>> http://www.bl.uk/bibliographic/datafree.html - data model 
>>>>>> documented, but no code available Libris.se - some notes in 
>>>>>> various presentations/blogposts (e.g.
>>>>>> http://dc2008.de/wp-content/uploads/2008/09/malmsten.pdf) but 
>>>>>> can't find explicit transformation Hungarian National library - 
>>>>>> http://thedatahub.org/dataset/hungarian-national-library-catalog 
>>>>>> and http://nektar.oszk.hu/wiki/Semantic_web#Implementation - some 
>>>>>> information on ontologies used but no code or explicit 
>>>>>> transformation (not 100% sure this is from MARC) Talis - 
>>>>>> implemented in several live catalogues including 
>>>>>> http://catalogue.library.manchester.ac.uk/  - no documentation or 
>>>>>> code afaik although some notes in
>>>>>> 
>>>>>> MAB transformation
>>>>>> HBZ - some of the transformation documented at 
>>>>>> https://wiki1.hbz-nrw.de/display/SEM/Converting+the+Open+Data+from+the+hbz+to+BIBO,
>>>>>>  don't think any code published?
>>>>>> 
>>>>>> Would be really helpful if more projects published their 
>>>>>> transformations (or someone told me where to look!)
>>>>>> 
>>>>>> Owen
>>>>>> 
>>>>>> Owen Stephens
>>>>>> Owen Stephens Consulting
>>>>>> Web: http://www.ostephens.com
>>>>>> Email: [email protected]
>>>>>> Telephone: 0121 288 6936
>>>>>> 
>>>>>> On 26 Nov 2011, at 15:58, Karen Coyle wrote:
>>>>>> 
>>>>>>> A few of the code4lib talk proposals mention projects that have or will 
>>>>>>> transform MARC records into RDF. If any of you have documentation 
>>>>>>> and/or examples of this, I would be very interested to see them, even 
>>>>>>> if they are "under construction."
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> kc
>>>>>>> 
>>>>>>> --
>>>>>>> Karen Coyle
>>>>>>> [email protected] http://kcoyle.net
>>>>>>> ph: 1-510-540-7596
>>>>>>> m: 1-510-435-8234
>>>>>>> skype: kcoylenet
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Karen Coyle
>>>> [email protected] http://kcoyle.net
>>>> ph: 1-510-540-7596
>>>> m: 1-510-435-8234
>>>> skype: kcoylenet
>>> 
>> 
>> 
>> 
>> --
>> Karen Coyle
>> [email protected] http://kcoyle.net
>> ph: 1-510-540-7596
>> m: 1-510-435-8234
>> skype: kcoylenet

Re: [CODE4LIB] Models of MARC in RDF

Reply via email to