Thanks Michael. So one weird thing is that at least some of those characters "specifically designated as control characters" aren't ordinarily what everyone else considers "control characters". To me, "control character" means ASCII less than 20. Which the last four aren't. So now it's unclear what the "prohibted" (by not being mentioned) control characters are, since I don't know what MARC considers a 'control character' exactly.

But I'm really just picking nits to demonstrate the impenetrability of MARC specs. I believe you all (especially Terry) that CR and LF aren't allowed.

But, two, Michael, are you the doran in this? http://rocky.uta.edu/doran/charsets/marc8default.html

You might want to remove CR, LF, and the other disallowed control characters from your own published list of MARC8 characters!

On 5/19/2011 3:16 PM, Doran, Michael D wrote:
Is it really true that newline characters are not allowed in a marc
value?
Yes.

   CONTROL FUNCTION CODES [1]

   Eight characters are specifically designated as control characters for MARC 
21 use:

   - escape character, 1B(hex) in MARC-8 and Unicode encoding
   - subfield delimiter, 1F(hex) in MARC-8 and Unicode encoding
   - field terminator, 1E(hex) in MARC-8 and Unicode encoding
   - record terminator, 1D(hex) in MARC-8 and Unicode encoding
   - non-sorting character(s) begin, 88(hex) in MARC-8 and 98(hex) in Unicode 
encoding
   - non-sorting character(s) end, 89(hex) in MARC-8 and 9C(hex) in Unicode 
encoding
   - joiner, 8D(hex) in MARC-8 and 200D (hex) in Unicode encoding
   - nonjoiner, 8E(hex) in MARC-8 and 200C (hex) in Unicode encoding.

[1] http://www.loc.gov/marc/specifications/specchargeneral.html#controlfunction

-- Michael

# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 mobile
# [email protected]
# http://rocky.uta.edu/doran/






-----Original Message-----
From: Code for Libraries [mailto:[email protected]] On Behalf Of
Jonathan Rochkind
Sent: Thursday, May 19, 2011 1:27 PM
To: [email protected]
Subject: Re: [CODE4LIB] is this valid marc ?

Is it really true that newline characters are not allowed in a marc
value?  I thought they were, not with any special meaning, just as
ordinary data.  If they're not, that's useful to know, so I don't put
any there!

I'd ask for a reference to the standard that says this, but I suspect
it's going to be some impenetrable implication of a side effect of an
subtle adjective either way.

On 5/19/2011 2:19 PM, Karen Coyle wrote:
Quoting Andreas Orphanides<[email protected]>:

Anyway, I think having these two parts of the same URL data on
separate lines is definitely Not Right, but I am not sure if it adds
up to invalid MARC.
Exactly. The CR and LF characters are NOT defined as valid in the MARC
character set and should not be used. In fact, in MARC there is no
concept of "lines", only variable length strings (usually up to 9999
char).

kc

-dre.

[1] http://www.loc.gov/marc/bibliographic/bd856.html
[2] I am not a cataloger. Don't hurt me.
[3] I am not an expert on MARC ingest or on ruby-marc. I could be wrong.

On 5/19/2011 12:37 PM, James Lecard wrote:
I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I
get
from a partner.

The 856 field is splitted over 2 lines, causing the ruby library to
ignore
it (I've patched it to overcome this issue) but I want to know if
this kind
of marc is valid ?

=LDR  00638nam  2200181uu 4500
=001  cla-MldNA01
=008  080101s2008\\\\\\\|||||||||||||||||fre||
=040  \\$aMy Provider
=041  0\$afre
=245  10$aThis Subject
=260  \\$aParis$bJ. Doe$c2008
=490  \\$aSome topic
=650  1\$aNarratif, Autre forme
=655  \7$abook$2lcsh
=752  \\$aA Place on earth
=776  \\$dParis: John Doe and Cie, 1973
=856  \2$qtext/html
=856
\\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library

Thanks,

James L.


Reply via email to