Thanks Michael. So one weird thing is that at least some of those
characters "specifically designated as control characters" aren't
ordinarily what everyone else considers "control characters". To me,
"control character" means ASCII less than 20. Which the last four
aren't. So now it's unclear what the "prohibted" (by not being
mentioned) control characters are, since I don't know what MARC
considers a 'control character' exactly.
But I'm really just picking nits to demonstrate the impenetrability of
MARC specs. I believe you all (especially Terry) that CR and LF aren't
allowed.
But, two, Michael, are you the doran in this?
http://rocky.uta.edu/doran/charsets/marc8default.html
You might want to remove CR, LF, and the other disallowed control
characters from your own published list of MARC8 characters!
On 5/19/2011 3:16 PM, Doran, Michael D wrote:
Is it really true that newline characters are not allowed in a marc
value?
Yes.
CONTROL FUNCTION CODES [1]
Eight characters are specifically designated as control characters for MARC
21 use:
- escape character, 1B(hex) in MARC-8 and Unicode encoding
- subfield delimiter, 1F(hex) in MARC-8 and Unicode encoding
- field terminator, 1E(hex) in MARC-8 and Unicode encoding
- record terminator, 1D(hex) in MARC-8 and Unicode encoding
- non-sorting character(s) begin, 88(hex) in MARC-8 and 98(hex) in Unicode
encoding
- non-sorting character(s) end, 89(hex) in MARC-8 and 9C(hex) in Unicode
encoding
- joiner, 8D(hex) in MARC-8 and 200D (hex) in Unicode encoding
- nonjoiner, 8E(hex) in MARC-8 and 200C (hex) in Unicode encoding.
[1] http://www.loc.gov/marc/specifications/specchargeneral.html#controlfunction
-- Michael
# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 mobile
# [email protected]
# http://rocky.uta.edu/doran/
-----Original Message-----
From: Code for Libraries [mailto:[email protected]] On Behalf Of
Jonathan Rochkind
Sent: Thursday, May 19, 2011 1:27 PM
To: [email protected]
Subject: Re: [CODE4LIB] is this valid marc ?
Is it really true that newline characters are not allowed in a marc
value? I thought they were, not with any special meaning, just as
ordinary data. If they're not, that's useful to know, so I don't put
any there!
I'd ask for a reference to the standard that says this, but I suspect
it's going to be some impenetrable implication of a side effect of an
subtle adjective either way.
On 5/19/2011 2:19 PM, Karen Coyle wrote:
Quoting Andreas Orphanides<[email protected]>:
Anyway, I think having these two parts of the same URL data on
separate lines is definitely Not Right, but I am not sure if it adds
up to invalid MARC.
Exactly. The CR and LF characters are NOT defined as valid in the MARC
character set and should not be used. In fact, in MARC there is no
concept of "lines", only variable length strings (usually up to 9999
char).
kc
-dre.
[1] http://www.loc.gov/marc/bibliographic/bd856.html
[2] I am not a cataloger. Don't hurt me.
[3] I am not an expert on MARC ingest or on ruby-marc. I could be wrong.
On 5/19/2011 12:37 PM, James Lecard wrote:
I'm using ruby-marc ruby parser (v.0.4.2) to parse some marc files I
get
from a partner.
The 856 field is splitted over 2 lines, causing the ruby library to
ignore
it (I've patched it to overcome this issue) but I want to know if
this kind
of marc is valid ?
=LDR 00638nam 2200181uu 4500
=001 cla-MldNA01
=008 080101s2008\\\\\\\|||||||||||||||||fre||
=040 \\$aMy Provider
=041 0\$afre
=245 10$aThis Subject
=260 \\$aParis$bJ. Doe$c2008
=490 \\$aSome topic
=650 1\$aNarratif, Autre forme
=655 \7$abook$2lcsh
=752 \\$aA Place on earth
=776 \\$dParis: John Doe and Cie, 1973
=856 \2$qtext/html
=856
\\$uhttp://www.this-link-will-not-be-retrieved-by-ruby-marc-library
Thanks,
James L.