On 10/24/2011 2:52 PM, Ross Singer wrote:
On Mon, Oct 24, 2011 at 7:39 PM, Eric Lease Morgan<emor...@nd.edu> wrote:
Okay. How do I go about converting MARC-8 encoded records into UTF-8? I know
yaz-marcdump changes the encoding bit in MARC leaders. Does it also convert
MARC-8 characters to UTF-8? (I guess I could simply try it and see what
happens.)
Yes, it does. It uses yaz-iconv. Theoretically, you could wrap some
Perl module around that. I've contemplated it for ruby-marc, but then
it always seems a lot easier to ignore it and delete any emails that
request it.
Or use jruby, where you can use Marc4J. Or actually port either the
Java or (apparently?) Perl version into ruby; okay that one is not
"easier" then anything in the short term, but in the long term I'd
rather have pure ruby that something that relies on an external bash
call or a C extension, those latter are invariably going to be annoying
and confusing maintenance down the line, in my experience.
But I'm not doing any of these things anytime soon either. So far all my
ruby that deals with Marc gets something else to convert it first. (In
my largest case, Java Marc4J converts it before it's stored in a stored
field in a Solr index, and my ruby only gets it from the stored field in
Solr, already converted).