RE: Help for utf-8 output

2008-02-21 Thread Doran, Michael D
Hi Jackie, I'm working on a very similar problem... converting theses/dissertations records (in XML) to MARC records. I'm still in the testing stage, but have had similar problems with records with diacritics in the 100 or 245 fields (however diacritics in a 520a field don't seem to cause any

Re: Help for utf-8 output

2008-02-21 Thread Brian Sheppard
I'd suggest you first make sure your XML is really UTF-8, using JHOVE: /path/to/jhove/jhove -c /path/to/jhove/conf/jhove.conf -m utf8-hul myFile.xml If it fails you could convert to utf8, on the (perhaps unwarranted) assumption it's windows latin1: iconv -c -f windows-1252 -t UTF-8

Re: Help for utf-8 output

2008-02-21 Thread Galen Charlton
Hi Jackie, On Tue, Feb 19, 2008 at 10:49 AM, Shieh, Jackie [EMAIL PROTECTED] wrote: What I have is an Excel spreadsheet for dissertations which I have saved as a tab delimited file (examining the file in TextPad, the diacritics appears to be fine), then read in and output the file as a utf-8

RE: Help for utf-8 output

2008-02-21 Thread Doran, Michael D
Hi Brian, Thanks for your response. I'd suggest you first make sure your XML is really UTF-8 I believe it is. I used a hex editor to look at the XML source file and the character in question (the Registered Sign) is encoded as hex c2 ae which is the proper UTF-8 encoding for that character