RE: [librecat-dev] A common MARC record path language

2014-02-25 Thread PHILLIPS M.E.
 You also could consider to grok Jason Thomale's Interpreting MARC:
 Where's the Bibliographic Data?  http://journal.code4lib.org/articles/3832 

That's a very good article, as it highlights the problems of the prescribed 
punctuation both getting in the way of extracting parts of the data and its 
role in providing extra context to the subfields.

 It is not a MOM (MARC Object Model) or rather an object model for
 any format derived from ISO 2709 and its concepts of files, records,
 (flavors of) fields and subfields and therefore no abstract API
 can be specified (prescribing that some operation X is defined on
 record objects and yields field objects).

If we are just talking about ISO 2709, the whole family of MARC formats in 
general, then you have to remember that UNIMARC and obsolete formats like 
UKMARC have very different requirements.  UKMARC and UNIMARC are actually much 
easier to work with than MARC21 because the ISBD punctuation is not carried in 
the record but is generated from the subfield tags.  So you don't have to say 
give me the 245 $a and $b but strip / off the end if present because the 
slash is not there.  And there is a different subfield tag to introduce a 
parallel title, so you don't need to distinguish :$b from =$b.

In the UK most libraries have been MARC21 for a decade or more now.  I don't 
know how much use is still made of UNIMARC, or the other national formats, nor 
how good they were.  It seems as though in the last twenty years many countries 
have made moves towards MARC21 because of the sheer numbers of records 
available in that format.  It's just a pity that it's possibly the worst of the 
ISO 2709 formats to work with if you want to repurpose the data!

I hope that BIBFRAME is not going to make the same mistakes.  I have not been 
following that initiative in detail, but I've seen a few examples of data with 
punctuation hanging about at the end.  Hard to tell whether it's prescribed 
punctuation or copying from the book.

The title field, in particular, is much more akin to HTML markup than data 
fields in a database.  In antiquarian cataloguing rules like DCRM, the emphasis 
is on exact transcription from the title page, where the presence or absence of 
punctuation can make a difference in identifying variant editions.  In MARC21 
we get the crazy situation where the cataloguers transcribe the exact 
punctuation from the title page and *add* the ISBD punctuation to the MARC21 
record.  This makes it very hard to present the lay-person with anything 
meaningful.

Matthew

-- 
Matthew Phillips
Head of Digital and Bibliographic Services,
Durham University Library, Stockton Road, Durham, DH1 3LY
+44 (0)191 334 2941



Re: [librecat-dev] A common MARC record path language

2014-02-25 Thread Thomas Berger
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Am 25.02.2014 12:50, schrieb PHILLIPS M.E.:

 If we are just talking about ISO 2709, the whole family of MARC formats in
 general, then you have to remember that UNIMARC and obsolete formats like 
 UKMARC
 have very different requirements. UKMARC and UNIMARC are actually much easier 
 to
 work with than MARC21 because the ISBD punctuation is not carried in the 
 record
 but is generated from the subfield tags. So you don't have to say give me the
 245 $a and $b but strip / off the end if present because the slash is not
 there.

same thing with MARC21: Punctuation regime for the record is governed by Leader
pos. 18 (descriptive cataloging form which currently gives the choice between
mainly AACR2, ISBD with punctuation and ISBD without punctuation - and
not yet code(s) for RDA).

Here in Germany there is a strong tradition that cataloguers shall not enter
punctuation when the field granularity of the underlying database allows its
automatic generation for display or conversion to other formats
(what I mean is: punctuation is generated when converting from the internal
format to MARC in cases where MARC is not as granular as the internal format).

This applies to RAK data in the union databases and its transport via MAB2 or
MARC21 and it is also the intention to carry this on when switching from RAK
to RDA.

[There's also been the regulation for the D-A-CH application layer to move
punctuation which cannot be eliminated to the start of the subfield it
belongs to, e.g.

245 $a title = $b parallel title

becomes

245 $a title $b = parallel title

probably on the prospect that this could ease processing...]

viele Gruesse
Thomas Berger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iJwEAQECAAYFAlMMjZ0ACgkQYhMlmJ6W47NLLgP+KJcGwEad9zbYoUNRQer/+XBd
L39rvnWDMK6XOmW5NL+M3FQFSfArT2iJ1eyIuni92gLMfURG+z96SrKVQNEcF+IL
DVglbTE4+6OqNGf61YcwBA3x/k+MVrmqGKLqoKE7R43FgaYHKk3s7PlYaf1au9mz
z9nNz/hZDEXmujNIxJ8=
=uVi7
-END PGP SIGNATURE-