Hi Jackie,
I'm working on a very similar problem... converting theses/dissertations
records (in XML) to MARC records. I'm still in the testing stage, but have had
similar problems with records with diacritics in the 100 or 245 fields (however
diacritics in a 520a field don't seem to cause any
I'd suggest you first make sure your XML is really UTF-8, using JHOVE:
/path/to/jhove/jhove -c /path/to/jhove/conf/jhove.conf -m utf8-hul
myFile.xml
If it fails you could convert to utf8, on the (perhaps unwarranted)
assumption it's windows latin1:
iconv -c -f windows-1252 -t UTF-8
Hi Jackie,
On Tue, Feb 19, 2008 at 10:49 AM, Shieh, Jackie [EMAIL PROTECTED] wrote:
What I have is an Excel spreadsheet for dissertations which I have saved as
a tab delimited file (examining the file in TextPad, the diacritics appears
to be fine), then read in and output the file as a utf-8
Hi Brian,
Thanks for your response.
I'd suggest you first make sure your XML is really UTF-8
I believe it is. I used a hex editor to look at the XML source file and the
character in question (the Registered Sign) is encoded as hex c2 ae which
is the proper UTF-8 encoding for that character