Yeah, but if there's Perl code and Java code to do it, can't be _that_ hard to port to ruby.... if I could figure out what you need to do to get first-class char encoding support in ruby 1.9 anyway.

I mean, you could do it just as a library without that... but it's enough trouble that, yeah, I don't want to do it, but if the benefit was first-class encoding support same as any other encoding in ruby 1.9, that you can use with the built in tools for converting encodings and any library that uses em.... bigger benefit.

But I had no idea Marc8 allowed escape sequences to temporarily switch to a different encoding. Really? Oh my god.

On 10/24/2011 3:10 PM, Doran, Michael D wrote:
Hi Jonathan,

I tried to figure out how to custom add a new encoding to ruby 1.9 with
the idea of adding Marc8 as an actuall  ruby 1.9 character encoding
supported same as any other built in char encoding
Not a trivial undertaking.  Remember that the MARC-8 environment allows alternate 
character sets to be invoked within a MARC record using two different "escape" 
methods [1].  Just one of the reasons why you're not finding a bunch of these MARC-8 
conversion modules, and one for every language. ;-)

-- Michael

[1] Technique 1 is unique to MARC-8 and provides access to a small number of Greek 
symbols, subscripts, and superscripts. Technique 2 is based on the ANSI X3.41 (ISO 2022) 
"Code Extension Techniques for Use with 7-bit and 8-bit Character Sets" 
standard. See the MARC 21 Specification for details on accessing alternate graphic 
character sets (http://www.loc.gov/marc/specifications/speccharmarc8.html#alternative).


-----Original Message-----
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Jonathan Rochkind
Sent: Monday, October 24, 2011 2:01 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] marc-8

What _ought_ to be easiest of all is getting our ILS's to NEVER export
Marc8 _ever_ again.  UTF8 only.

Sadly, that only ought to be easiest.

But IMO there's no reason any of us should be dealing with Marc8 ever
again.  The only thing that should deal in Marc8 is an ILS, and should
only input it, NEVER output it, UTF8 only, please!

But this is not the world we live in.

I tried to figure out how to custom add a new encoding to ruby 1.9 with
the idea of adding Marc8 as an actuall  ruby 1.9 character encoding
supported same as any other built in char encoding, but I couldn't
figure out if that was possible or how to do it.  If it was possible to
do at that low level in ruby 1.9, it might justify the time to do it.

On 10/24/2011 2:55 PM, Doran, Michael D wrote:
Eric,

Sometimes for grandpa Perl stuff -- especially as concerns charsets and/or
internationalization -- it's worth pinging these lists:
        perl4...@perl.org (yes, still alive and kicking)

        perl-i...@perl.org (very low traffic list, but some knowledgeable
subscribers)
-- Michael

-----Original Message-----
From: Doran, Michael D
Sent: Monday, October 24, 2011 1:48 PM
To: 'Code for Libraries'
Subject: RE: [CODE4LIB] marc-8

Okay. How do I go about converting MARC-8 encoded records into UTF-8?
In Perl... using the handy MARC::Charset module (tip 'o the hat to Ed
Summers, and now maintained by Galen Charlton).

-- Michael

-----Original Message-----
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Eric
Lease Morgan
Sent: Monday, October 24, 2011 1:39 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] marc-8

On Oct 24, 2011, at 2:34 PM, Doran, Michael D wrote:

In Perl, how do I specify MARC-8 when reading (decoding) and writing
(encoding) data?
You can't.  MARC-8 is a character set that is unknown to the operating
system.  Your best bet is to convert MARC-8-encoded records into UTF-8.

/me throws his hands up in the air and screams!

Okay. How do I go about converting MARC-8 encoded records into UTF-8? I
know
yaz-marcdump changes the encoding bit in MARC leaders. Does it also
convert
MARC-8 characters to UTF-8? (I guess I could simply try it and see what
happens.)

--
Eric Morgan

Reply via email to