-----BEGIN PGP SIGNED MESSAGE-----
> I think the whole problem lies in the limited expressivity of strings.
> MARCspec is pretty much close to XPath at its approach, but without regular
> expressions and functions like first(), last() etc. But even with XPath it
> be pretty hard to get the character before a subfield in a MARCXML file.
> The only solution I can think of, is using regular expressions. And I'm not
> convinced that bringing this into MARCspec is a good idea. As I already
> mentioned in the spec, MARCspec is not independent from the application using
> MARCspec. Taking regular expressions into MARCspec wouldn't make the
> more usable, but would blow up the specification.
Agreed, therefore regular expressions or other /general/ mechanisms
should not the way to go (for specifying MARCspecs - specific implementations
may realize it using a regexp implementation at hand)
Thus, yes, limited expressivness of strings demands to make the most
typical and most important "operations" on MARC records to be
expressible. But if it's too limited (say it could only extract fields
or has blind spots - parts of record data which cannot be accessed at all)
it wouldn't be of any use.
Thus MARCspec's need a convincing approach to the peculiarities of MARC
Subfields are not always data elements in a proper sense, sometimes
they are just marks interspersed into the field content.
And as Patrick pointed out there is the presence of non-MARC delimiters
(markup) which is crucial for processing of some (sub)fields.
Many fields contain "ensembles" of subfields with one nature, accompanied
by other, more data-like subfields of a different nature:
- - Most subfields in 700 are a simple copy of some (hypothetical) authority
record's 100, however $e and/or $4 denote the function of that person with
respect to the work described by the record at hand - and repeatable $0's
just are complimentary to the "core" subfields which well may be $a,$b,$c,$d,
$f,$g,$j,$k,$l,$n,$p,$q,$t and $u (some of them repeatable and don't even
dare to change anything in their order). Use cases might include /selection/
based on one or more of the more data-like subfields and /reduction/ of the
field to a form suitable for further proessing (indexing without $e, display
including $e, or with deviant formatting of $e with reverence to today's
slighly silly discussion on AUTOCAT concerning photographers acting as authors
and authors acting as photographers to the perplexion of patrons ...).
- - Same issue with most fields 77X: most subfields pertain to the work,
some are the individual "coordinates" within this work for that part
described by the given record
- - The 245 example (and also the $e in 100's) may demonstrate a need to
/partition/ a field at certain spots - maybe before or after subfields
meeting some content condition.
- - Ubiquitous (in the specification, maybe not in the "field") are $6 and
$8's. If MARCspec's could make thusly interwoven fields accessible
as ensembles - that would be an enormous benefit!
- From my limited experience the "unclear" nature of subfields really is the
hard part in MARC processing: If you delve into subfield processing too
early you get data fragments almost or completely impossible to reassemble
into something meaningful. On the other hand looking at fields as a whole
gives you more chances to understand what it is about but you're going
to choke on the weeding out necessary to proceed.
Thus maybe due to my limited experience in MARC processing I'd very much
appreciate MARCspec as a grammar to formulate those tasks that really
matter (and are hard to be done 100% right). To achieve that - cf.
Patrick's reply again - one or several "processing paradigms" for MARC
records should serve as a base and - for clarities' sake - should be made
explicit in the MARCspec specification.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
-----END PGP SIGNATURE-----