Hi Carsten Thanks for the new spec I think it is a great initiative to align many projects that are processing MARC records. Here are some general remarks I hope we can use to discuss the spec more in depth.
What I'm missing reading the specification is a separate use-case document. In the spec I see sections like the introduction of "2 Expressing MARCspecs as string" and "2.1" which are design concerns which require a separate discussion from the formal part of the document. I mean, I can agree or disagree with the design concerns..with the formal section I should be able to say if it is correct or not. The discussion we have here in this email thread deserves a separate document of use-cases. Producing Linked Data is only of the cases. Solrmarc is about transforming MARC into something that can be send to SOLR. In ILS systems you might use it to point to parts of MARC you want to display in a webinterface. In catmandu you might want to produce reports. Every use-case can have its own needs to make parts of MARC easy addressable. We need tools like easyM2R, solrmarc, catmandu not only because of the verboseness of XPath or because it is tight to one possible serialization of MARC. Of course I love to write 100$a instead of /marc:record//marc:datafield[@tag='100'] This opens up a new class of easy DSL tools to process our datasets. But..this treats MARC as a document key-value exchange format for bibliographical data. And I can't agree with that... or not in a strict sense. I can as easily state that MARC is a mark-up language that requires more processing after the first mappings have been made. E.g. if you want to map 260$c to an xsd^date field you really need get rid of the trailing dot '.' at the end. MARC is a key-value exchange format only as first approximation. Using cataloging rules you can get much more information out of the record. And I wonder if in a second approximation we could add paths that implement some of that logic. For instance. as stupid example: 245{/$.} : could evaluate to everything in 245 until you hit the first /$$subfield In catmandu..we'll we don't have a spec for that. We do the same things as in easyM2R and solrmarc and create a small DSL language of functions that get MARCspecs as input. Of course we could all agree on a same collection of functions like move_field, split_field, copy_field etc etc. But I hope there are other options also. Cheers Patrick ________________________________________ From: Klee, Carsten [carsten.k...@sbb.spk-berlin.de] Sent: Wednesday, February 19, 2014 2:27 PM To: 't...@gymel.com'; Patrick Hochstenbach Cc: v...@gbv.de; librecat-...@mail.librecat.org; perl4lib@perl.org Subject: AW: [librecat-dev] A common MARC record path language Hi Thomas and Patrick! I think the whole problem lies in the limited expressivity of strings. MARCspec is pretty much close to XPath at its approach, but without regular expressions and functions like first(), last() etc. But even with XPath it would be pretty hard to get the character before a subfield in a MARCXML file. The only solution I can think of, is using regular expressions. And I'm not convinced that bringing this into MARCspec is a good idea. As I already mentioned in the spec, MARCspec is not independent from the application using MARCspec. Taking regular expressions into MARCspec wouldn't make the application more usable, but would blow up the specification. One example: The data in field 245 is: "$aConcerto per piano n. 21, K 467$h[sound recording] /$cW.A. Mozart" The desired result is (rule: take everything from 245 until the string ' /$' appears): "Concerto per piano n. 21, K 467 [sound recording]" Imagine a MARCspec with regular expression. // pseudo code coming up! marcspec = "245.match(/(.*)\s\/\$/)" titleData = getMARCspec(record, marcspec) print titleData[1] // should result in "$aConcerto per piano n. 21, K 467$h[sound recording]" Now pretty the same but without the regular expression in the MARCspec. marcspec = "245" titleData = getMARCspec(record, marcspec).match(/(.*)\s\/\$/) print titleData[1] // should result in "$aConcerto per piano n. 21, K 467$h[sound recording]" You see, nothing won here. But an application could provide a special function like function takeEverythingFromSpecUntilYouHitBeforeSubfield(marcspec,hitWhat,record) { // get the data before the / or = or else regex = new RegExp("(.*)\\s\\" + hitWhat + "\\$") data = getMARCspec(record, marcspec).match(regex)[1] // now split on subfield dataSplit = data.split(/\$[a-z0-9]/) // loop everything into result for (i = 1; i < dataSplit.length-1; i++) { result += dataSplit[i] + " " } result += dataSplit[dataSplit.length] return result } In Catmandu or elsewhere the user calls the function takeEverythingFromSpecUntilYouHitBeforeSubfield("245","/",record) --> this should result in the desired "Concerto per piano n. 21, K 467 [sound recording]". If there is any other approach you can think of, pleeeease make a proposal or give me a substantial discussion here. Otherwise I can't see any options solving this problem in MARCspec. Cheers! Carsten _______________________________________________ Carsten Klee Abt. Überregionale Bibliographische Dienste IIE Staatsbibliothek zu Berlin - Preußischer Kulturbesitz Fon: +49 30 266-43 44 02 > -----Ursprüngliche Nachricht----- > Von: Thomas Berger [mailto:t...@gymel.com] > Gesendet: Mittwoch, 19. Februar 2014 01:04 > An: Klee, Carsten; 'Patrick Hochstenbach' > Cc: v...@gbv.de; librecat-...@mail.librecat.org; perl4lib@perl.org > Betreff: Re: [librecat-dev] A common MARC record path language > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > > Am 18.02.2014 17:47, schrieb Klee, Carsten: > > > I understand that there is MARC data combined with cataloging rules. We > > don't use this approach within our MARC. So I'm not really aware of the > problematics. > > "Your" MARC however will be very much interested in "/" (or "=") as the > first > character of some subfield in 245 if I recall correctly. Not such a big > difference I would think. But maybe a slight complication of the matter, > since MARCspec should have to cope with both approaches... > > Thomas Berger > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1 > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ > > iJwEAQECAAYFAlMD9NYACgkQYhMlmJ6W47PzEQP/RIfm5bsHLTwhJMLJjNjF3vO/ > XIpKt98CPUgy+hcFXc4hpTi+UH8j7NIWtaCyXYOfdL4xryzI0kEk98brZ/4TJG+9 > IxzPZ8WDQL8bjX1hRTF8P4qjn/u+nyvDFFvdbM4kH7QhYhPeeWfoVqtCnMFHLzFJ > 7v+o6x2CKH2MnfOcgGI= > =yBFy > -----END PGP SIGNATURE-----