Re: [CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 and MARC21

Karen Coyle Wed, 18 Apr 2012 09:08:56 -0700

At the time of creation, characters and bytes were 1-to-1 because MARCused only ASCII. So there was no distinction at the outset. Somepositions are still limited to ascii characters (Leader, fixed fields,subfield codes, etc.).

kc


On 4/18/12 7:20 AM, Huwig,Steve wrote:

I could be mistaken (never having had the pleasure of reading it), but
isn't ISO-2709 specified as a fixed number of characters, and any
conflation of characters and 8-bit bytes is on the part of users and
implementations?

I think ISO 2709 might not know from bytes, only characters.

-----Original Message-----
From: Code for Libraries [mailto:[email protected]] On Behalf

Of

Doran, Michael D
Sent: Wednesday, April 18, 2012 10:05 AM
To: [email protected]
Subject: Re: [CODE4LIB] more on MARC char encoding: Now we're about
ISO_2709 and MARC21

Hi Tod,

I'm not understanding how UTF-8 would be considered 8-bit character
data (other than the ASCII-range of the Unicode repertoire, natch).  I
don't think ISO 2709 knows from characters, only bytes.

-- Michael

# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 mobile
# [email protected]
# http://rocky.uta.edu/doran/

-----Original Message-----
From: Code for Libraries [mailto:[email protected]] On Behalf

Of

Tod Olson
Sent: Wednesday, April 18, 2012 5:04 AM
To: [email protected]
Subject: Re: [CODE4LIB] more on MARC char encoding: Now we're about
ISO_2709 and MARC21

It has to mean UTF-8. ISO 2709 is very byte-oriented, from the

directory

structure to the byte-offsets in the fixed fields. The values in

these

places all assume 8-bit character data, it's completely baked in to

the

file format.

-Tod

On Apr 17, 2012, at 6:55 PM, Jonathan Rochkind wrote:

Okay, forget XML for a moment, let's just look at marc 'binary'.

First, for Anglophone-centric MARC21.

The LC docs don't actually say quite what I thought about leader

byte

09, used to advertise encoding:



a - UCS/Unicode
Character coding in the record makes use of characters from the

Universal Coded Character Set (UCS) (ISO 10646), or Unicode(tm), an

industry

subset.




That doesn't say UTF-8. It says UCS or "Unicode". What does that

actually mean?  Does it mean UTF-8, or does it mean UTF-16 (closer

to

what used to be called "UCS" I think?).  Whatever it actually means,

do

people violate it in the wild?




Now we get to non-Anglophone centric marc. I think all of which is

ISO_2709?  A standard which of course is not open access, so I can't

get

it to see what it says.


But leader 09 being used for encoding -- is that Marc21 specific,

or is

it true of any ISO-2709?  Marc8 and "unicode" being the only valid
encodings can't be true of any ISO-2709, right?


Is there a generic ISO-2709 way to deal with this, or not so much?


--
Karen Coyle
[email protected] http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet

Re: [CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 and MARC21

Reply via email to