-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Carsten,
> I think the whole problem lies in the limited expressivity of strings. > MARCspec is pretty much close to XPath at its approach, but without regular > expressions and functions like first(), last() etc. But even with XPath it > would > be pretty hard to get the character before a subfield in a MARCXML file. > > The only solution I can think of, is using regular expressions. And I'm not > convinced that bringing this into MARCspec is a good idea. As I already > mentioned in the spec, MARCspec is not independent from the application using > MARCspec. Taking regular expressions into MARCspec wouldn't make the > application > more usable, but would blow up the specification. Agreed, therefore regular expressions or other /general/ mechanisms should not the way to go (for specifying MARCspecs - specific implementations may realize it using a regexp implementation at hand) Thus, yes, limited expressivness of strings demands to make the most typical and most important "operations" on MARC records to be expressible. But if it's too limited (say it could only extract fields or has blind spots - parts of record data which cannot be accessed at all) it wouldn't be of any use. Thus MARCspec's need a convincing approach to the peculiarities of MARC records: Subfields are not always data elements in a proper sense, sometimes they are just marks interspersed into the field content. And as Patrick pointed out there is the presence of non-MARC delimiters (markup) which is crucial for processing of some (sub)fields. Many fields contain "ensembles" of subfields with one nature, accompanied by other, more data-like subfields of a different nature: - - Most subfields in 700 are a simple copy of some (hypothetical) authority record's 100, however $e and/or $4 denote the function of that person with respect to the work described by the record at hand - and repeatable $0's just are complimentary to the "core" subfields which well may be $a,$b,$c,$d, $f,$g,$j,$k,$l,$n,$p,$q,$t and $u (some of them repeatable and don't even dare to change anything in their order). Use cases might include /selection/ based on one or more of the more data-like subfields and /reduction/ of the field to a form suitable for further proessing (indexing without $e, display including $e, or with deviant formatting of $e with reverence to today's slighly silly discussion on AUTOCAT concerning photographers acting as authors and authors acting as photographers to the perplexion of patrons ...). - - Same issue with most fields 77X: most subfields pertain to the work, some are the individual "coordinates" within this work for that part described by the given record - - The 245 example (and also the $e in 100's) may demonstrate a need to /partition/ a field at certain spots - maybe before or after subfields meeting some content condition. - - Ubiquitous (in the specification, maybe not in the "field") are $6 and $8's. If MARCspec's could make thusly interwoven fields accessible as ensembles - that would be an enormous benefit! - From my limited experience the "unclear" nature of subfields really is the hard part in MARC processing: If you delve into subfield processing too early you get data fragments almost or completely impossible to reassemble into something meaningful. On the other hand looking at fields as a whole gives you more chances to understand what it is about but you're going to choke on the weeding out necessary to proceed. Thus maybe due to my limited experience in MARC processing I'd very much appreciate MARCspec as a grammar to formulate those tasks that really matter (and are hard to be done 100% right). To achieve that - cf. Patrick's reply again - one or several "processing paradigms" for MARC records should serve as a base and - for clarities' sake - should be made explicit in the MARCspec specification. Thomas -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iJwEAQECAAYFAlMFKsEACgkQYhMlmJ6W47M05wP/WcjpFrIXlOI/y21kxcYc+XDH QHT/8QypD6yKqHM8c7KzcHB8efttB7CQ8mB7cAtqxqQw2oqPzicnkYXIJU9Z9Yxm yIaJXPWKovgypLNn4sAjPf2/MsJMYTtCrLOGwWxgp+Uq8bvAuZx5iMr1rKP68PzH DCGkPq31KhMT1tUBHMk= =EP69 -----END PGP SIGNATURE-----