Hi Brendan: Ahh the lovely MARC-8 :-)
It's a fair bit of effort I think. One approach could be to porting the MARC8->Unicode functionality from pymarc [1,2]. It's only one-way, but that's normally what most sane people want to do anyhow. Another approach would be to look into wrapping yaz-iconv [3] from IndexData which provides much more (and faster) MARC related character mapping facilities. If you just want to get something done without extending ruby-marc you can pre-process your data with yaz-marcdump and then throw it at ruby-marc. Or perhaps if you are in jruby-land you could use marc4j which has MARC-8 support. I've cc'ed code4lib since someone else might have some better ideas. Thanks for writing. //Ed [1] http://bazaar.launchpad.net/~ehs-pobox/pymarc/dev/annotate/head%3A/pymarc/marc8.py [2] http://bazaar.launchpad.net/~ehs-pobox/pymarc/dev/annotate/head%3A/pymarc/marc8_mapping.py [3] http://www.indexdata.com/yaz/doc/yaz-iconv.html [4] http://marc4j.tigris.org/ On Fri, Oct 30, 2009 at 3:22 AM, Brendan Boesen <bboe...@nla.gov.au> wrote: > Hi Guys, > > I guess this is the 'bug the authors if you need it' email. > > I'm trying to parse a MARC record and it contains Chinese characters. From > the leader: > 01051cam 2200265 a 4504 > it looks like the record uses MARC8 encoding. > > I'm investigating a way to get a Unicode encoded one but that may not work > out. What sort of effort do you think is involved in adding MARC8 support > into marc-ruby? (And is there anything I could do to help with that?) > > Regards, > > Brendan Boesen > National Library of Australia > >