RE: [librecat-dev] A common MARC record path language

Patrick Hochstenbach Wed, 19 Feb 2014 11:39:01 -0800

Hi Carsten

Thanks for the new spec I think it is a great initiative to align many projects 
that are processing MARC records. Here are some general remarks I hope we can 
use to discuss the spec more in depth.


What I'm missing reading the specification is a separate use-case document. In 
the spec I see sections like the introduction of "2 Expressing MARCspecs as 
string" and "2.1" which are design concerns which require a separate discussion 
from the formal part of the document. I mean, I can agree or disagree with the 
design concerns..with the formal section I should be able to say if it is 
correct or not.

The discussion we have here in this email thread deserves a separate document 
of use-cases. Producing Linked Data is only of the cases. Solrmarc is about  
transforming MARC into something that can be send to SOLR. In ILS systems you 
might use it to point to parts of MARC you want to display in a webinterface. 
In catmandu you might want to produce reports. Every use-case can have its own 
needs to make parts of MARC easy addressable.

We need tools like easyM2R, solrmarc, catmandu not only because of the 
verboseness of XPath or because it is tight to one possible serialization of 
MARC. Of course I love to write

100$a instead of /marc:record//marc:datafield[@tag='100']

This opens up a new class of easy DSL tools to process our datasets. 

But..this treats MARC as a document key-value exchange format for 
bibliographical data. And I can't agree with that... or not in a strict sense. 
I can as easily state that MARC is a mark-up language that requires more 
processing after the first mappings have been made. E.g. if you want to map 
260$c to an xsd^date field you really need get rid of the trailing dot '.' at 
the end. MARC is a key-value exchange format only as first approximation.

Using cataloging rules you can get much more information out of the record. And 
I wonder if in a second approximation we could add paths that implement some of 
that logic.

For instance. as stupid example:

245{/$.} : could evaluate to everything in 245 until you hit the first 
/$$subfield

In catmandu..we'll we don't have a spec for that. We do the same things as in  
easyM2R and solrmarc and create a small DSL language of functions that get 
MARCspecs as input. Of course we could all agree on a same collection of 
functions like move_field, split_field, copy_field etc etc. But I hope there 
are other options also.

Cheers
Patrick

________________________________________
From: Klee, Carsten [carsten.k...@sbb.spk-berlin.de]
Sent: Wednesday, February 19, 2014 2:27 PM
To: 't...@gymel.com'; Patrick Hochstenbach
Cc: v...@gbv.de; librecat-...@mail.librecat.org; perl4lib@perl.org
Subject: AW: [librecat-dev] A common MARC record path language

Hi Thomas and Patrick!

I think the whole problem lies in the limited expressivity of strings. MARCspec 
is pretty much close to XPath at its approach, but without regular expressions 
and functions like first(), last() etc. But even with XPath it would be pretty 
hard to get the character before a subfield in a MARCXML file.

The only solution I can think of, is using regular expressions. And I'm not 
convinced that bringing this into MARCspec is a good idea. As I already 
mentioned in the spec, MARCspec is not independent from the application using 
MARCspec. Taking regular expressions into MARCspec wouldn't make the 
application more usable, but would blow up the specification.

One example:

The data in field 245 is:

"$aConcerto per piano n. 21, K 467$h[sound recording] /$cW.A. Mozart"

The desired result is (rule: take everything from 245 until the string ' /$' 
appears):

"Concerto per piano n. 21, K 467 [sound recording]"

Imagine a MARCspec with regular expression. // pseudo code coming up!

marcspec = "245.match(/(.*)\s\/\$/)"
titleData = getMARCspec(record, marcspec)
print titleData[1]
// should result in "$aConcerto per piano n. 21, K 467$h[sound recording]"

Now pretty the same but without the regular expression in the MARCspec.

marcspec = "245"
titleData = getMARCspec(record, marcspec).match(/(.*)\s\/\$/)
print titleData[1]
// should result in "$aConcerto per piano n. 21, K 467$h[sound recording]"

You see, nothing won here.

But an application could provide a special function like

function 
takeEverythingFromSpecUntilYouHitBeforeSubfield(marcspec,hitWhat,record)
{
    // get the data before the / or = or else
    regex = new RegExp("(.*)\\s\\" + hitWhat + "\\$")
    data = getMARCspec(record, marcspec).match(regex)[1]

    // now split on subfield
    dataSplit = data.split(/\$[a-z0-9]/)

    // loop everything into result
    for (i = 1; i < dataSplit.length-1; i++)
    {
        result += dataSplit[i] + " "
    }
    result += dataSplit[dataSplit.length]

    return result
}

In Catmandu or elsewhere the user calls the function

takeEverythingFromSpecUntilYouHitBeforeSubfield("245","/",record)

--> this should result in the desired "Concerto per piano n. 21, K 467 [sound 
recording]".

If there is any other approach you can think of, pleeeease make a proposal or 
give me a substantial discussion here. Otherwise I can't see any options 
solving this problem in MARCspec.

Cheers!

Carsten
_______________________________________________
Carsten Klee
Abt. Überregionale Bibliographische Dienste IIE
Staatsbibliothek zu Berlin - Preußischer Kulturbesitz

Fon:  +49 30 266-43 44 02

> -----Ursprüngliche Nachricht-----
> Von: Thomas Berger [mailto:t...@gymel.com]
> Gesendet: Mittwoch, 19. Februar 2014 01:04
> An: Klee, Carsten; 'Patrick Hochstenbach'
> Cc: v...@gbv.de; librecat-...@mail.librecat.org; perl4lib@perl.org
> Betreff: Re: [librecat-dev] A common MARC record path language
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
>
> Am 18.02.2014 17:47, schrieb Klee, Carsten:
>
> > I understand that there is MARC data combined with cataloging rules. We
> > don't use this approach within our MARC. So I'm not really aware of the
> problematics.
>
> "Your" MARC however will be very much interested in "/" (or "=") as the
> first
> character of some subfield in 245 if I recall correctly. Not such a big
> difference I would think. But maybe a slight complication of the matter,
> since MARCspec should have to cope with both approaches...
>
> Thomas Berger
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iJwEAQECAAYFAlMD9NYACgkQYhMlmJ6W47PzEQP/RIfm5bsHLTwhJMLJjNjF3vO/
> XIpKt98CPUgy+hcFXc4hpTi+UH8j7NIWtaCyXYOfdL4xryzI0kEk98brZ/4TJG+9
> IxzPZ8WDQL8bjX1hRTF8P4qjn/u+nyvDFFvdbM4kH7QhYhPeeWfoVqtCnMFHLzFJ
> 7v+o6x2CKH2MnfOcgGI=
> =yBFy
> -----END PGP SIGNATURE-----

RE: [librecat-dev] A common MARC record path language

Reply via email to