Why do you think the file is correct? It sounds like 'vi' (aka vim?) did not write a valid utf-8 file ... Maybe it was working in some 8bit character set (i.e., a non unicode 256 character code set, e.g., ASCII) and used some value in the upper 128 characters (the non-unicode part of ASCII) - which would be interpreted by unicode processing as a multi-byte character, but the subsequent bytes would not be correct.
Cat'ing to the terminal probably won't work as most terminal emulators are running with some 8bit character code (e.g., ASCII) with 256 characters and not UNICODE. Even using 'vi' you probably have to tell it that the file is utf-8 and not ASCII. -----Original Message----- From: Pander [mailto:[EMAIL PROTECTED] Sent: Friday, January 06, 2006 12:07 PM To: dev@castor.codehaus.org Subject: [castor-dev] How to handle UTF-8 characters like · and ô >>> Because I did not got an answer on via the post in the user list, I repost this in the dev list <<< Hi all, With Castor 1.0M1 and Java 1.4.2 I have the following problem with special characters. (Both with Blackdown Java(TM) 2 SDK, Standard Edition, Ubuntu Breezy Badger package AND j2sdk from sun for Linux.) An XML file holds special characters as · (centered dot) and ô (o with a ^ above it). The XML file has been created with vi but when I cat it to my terminal these special characters look like empty squares. When I unmarshal the XML file and write the string to a file, these special characters are all black diamonds with a white question mark inside (both when opening with vi or catting to my terminal). I have tested the XML file with: <?xml version="1.0" encoding="Latin1"?> and <?xml version="1.0" encoding="UTF-8"?> Both give the same result as described above. However validating the XML with org.apache.xerces.parsers.DOMParser results in an error for the UTF-8 case: [Fatal Error] test.xml:8:47: Invalid byte 2 of 4-byte UTF-8 sequence. Exception caught in main: org.xml.sax.SAXParseException: Invalid byte 2 of 4-byte UTF-8 sequence. How can I fix this so I can use these special characters? Thanks, Pander ------------------------------------------------- If you wish to unsubscribe from this list, please send an empty message to the following address: [EMAIL PROTECTED] ------------------------------------------------- ------------------------------------------------- If you wish to unsubscribe from this list, please send an empty message to the following address: [EMAIL PROTECTED] -------------------------------------------------