FF FE means this unicode file is little endian. The embed time stamp is Apr 18, 2005. It looks like your file is in UTF-16. Is the file generated by Hackystat?
At 06:15 AM 5/24/2005, Philip Johnson wrote:
The potentially bigger issue is _why_ we have files being created with the wrong encoding. As a first step, can you characterize which files (from which user (email, not user key)) blow up? Is this really old data, or really new data? As a simple test of your hypothesis, manually edit the encoding attribute on this file to be UTF-16, then see if it reads in properly. Cheers, Philip --On Monday, May 23, 2005 11:35 PM -1000 "(Cedric) Qin ZHANG" <[EMAIL PROTECTED]> wrote:Hi, I have encountered some of our sensor data files in unicode. If you look at them using a text editor, they look good and everything is cool. <?xml version="1.0" encoding="UTF-8"?> <sensor> <entry tstamp="1113819173814" .... .... However, if you use a hex editor, you would see: FF FE 3C 00 3F 00 78 00 6D 00 63 00.... FFFE: (My guess) unicode endian order mark 3C00: < 3F00: ? 7800: x 6D00: m 6300: l Obviously, the file uses UTF-16 encoding. The problem is when I use JDOM to parse it: Document doc = new SAXBuilder().build(fileName) It gives exception: "Error on line 1: Document root element is missing." I think JDOM is confused by "FFFE" at the beginning of the file. Does anybody know how to solve the problem? Thanks Cedric
