> I could be mistaken (never having had the pleasure of reading it), but > isn't ISO-2709 specified as a fixed number of characters, and any > conflation of characters and 8-bit bytes is on the part of users and > implementations?
I don't believe that is the case. Take UTF-8 out of the picture, and consider the MARC-8 character set with its escape sequences and combining characters. A character such as an "n" with a tilde would consist of two bytes. The Greek small letter alpha, if invoked in accordance with ANSI X3.41, would consist of five bytes (two bytes for the initial escape sequence, a byte for the character, and then two bytes for the escape sequence returning to the default character set). -- Michael # Michael Doran, Systems Librarian # University of Texas at Arlington # 817-272-5326 office # 817-688-1926 mobile # do...@uta.edu # http://rocky.uta.edu/doran/ > -----Original Message----- > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of > Huwig,Steve > Sent: Wednesday, April 18, 2012 9:21 AM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] more on MARC char encoding: Now we're about > ISO_2709 and MARC21 > > I could be mistaken (never having had the pleasure of reading it), but > isn't ISO-2709 specified as a fixed number of characters, and any > conflation of characters and 8-bit bytes is on the part of users and > implementations? > > I think ISO 2709 might not know from bytes, only characters. > > > -----Original Message----- > > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf > Of > > Doran, Michael D > > Sent: Wednesday, April 18, 2012 10:05 AM > > To: CODE4LIB@LISTSERV.ND.EDU > > Subject: Re: [CODE4LIB] more on MARC char encoding: Now we're about > > ISO_2709 and MARC21 > > > > Hi Tod, > > > > I'm not understanding how UTF-8 would be considered 8-bit character > > data (other than the ASCII-range of the Unicode repertoire, natch). I > > don't think ISO 2709 knows from characters, only bytes. > > > > -- Michael > > > > # Michael Doran, Systems Librarian > > # University of Texas at Arlington > > # 817-272-5326 office > > # 817-688-1926 mobile > > # do...@uta.edu > > # http://rocky.uta.edu/doran/ > > > > > > > -----Original Message----- > > > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf > > Of > > > Tod Olson > > > Sent: Wednesday, April 18, 2012 5:04 AM > > > To: CODE4LIB@LISTSERV.ND.EDU > > > Subject: Re: [CODE4LIB] more on MARC char encoding: Now we're about > > > ISO_2709 and MARC21 > > > > > > It has to mean UTF-8. ISO 2709 is very byte-oriented, from the > > directory > > > structure to the byte-offsets in the fixed fields. The values in > > these > > > places all assume 8-bit character data, it's completely baked in to > > the > > > file format. > > > > > > -Tod > > > > > > On Apr 17, 2012, at 6:55 PM, Jonathan Rochkind wrote: > > > > > > > Okay, forget XML for a moment, let's just look at marc 'binary'. > > > > > > > > First, for Anglophone-centric MARC21. > > > > > > > > The LC docs don't actually say quite what I thought about leader > > byte > > > 09, used to advertise encoding: > > > > > > > > > > > > a - UCS/Unicode > > > > Character coding in the record makes use of characters from the > > > Universal Coded Character Set (UCS) (ISO 10646), or Unicode(tm), an > > industry > > > subset. > > > > > > > > > > > > > > > > That doesn't say UTF-8. It says UCS or "Unicode". What does that > > > actually mean? Does it mean UTF-8, or does it mean UTF-16 (closer > to > > > what used to be called "UCS" I think?). Whatever it actually means, > > do > > > people violate it in the wild? > > > > > > > > > > > > > > > > Now we get to non-Anglophone centric marc. I think all of which is > > > ISO_2709? A standard which of course is not open access, so I can't > > get > > > it to see what it says. > > > > > > > > But leader 09 being used for encoding -- is that Marc21 specific, > > or is > > > it true of any ISO-2709? Marc8 and "unicode" being the only valid > > > encodings can't be true of any ISO-2709, right? > > > > > > > > Is there a generic ISO-2709 way to deal with this, or not so much?