Re: [HACKYSTAT-DEV-L] Can anyone help with JDOM SAX parser and unicode issues

(Cedric) Qin ZHANG Tue, 24 May 2005 17:14:56 -0700

I randomly checked about 20 files on the public server, every single
file is the same: unicode encoded starting with FF FE.


I am working on a issue that Cruise control cannot distinguish between
AutoStart and ManualStart in some cases (HACK-259). I wrote a simple
java program that reads the existing build sdt data files, and fixes the
start type. So, this is a separate program.

The weird thing is nothing blows up in Hackystat, but it blow up in my
application.

Any ideas?

Cheers,

Cedric


Hongbing Kou wrote:

FF FE means this unicode file is little endian.  The embed time stamp is
Apr 18, 2005.
It looks like your file is in UTF-16. Is the file generated by Hackystat?

At 06:15 AM 5/24/2005, Philip Johnson wrote:

The potentially bigger issue is _why_ we have files being created with
the
wrong encoding.   As a first step, can you characterize which files (from
which user (email, not user key)) blow up? Is this really old data, or
really new data?

As a simple test of your hypothesis, manually edit the encoding attribute
on this file to be UTF-16, then see if it reads in properly.

Cheers,
Philip

--On Monday, May 23, 2005 11:35 PM -1000 "(Cedric) Qin ZHANG"
<[EMAIL PROTECTED]> wrote:

Hi,

I have encountered some of our sensor data files in unicode.

If you look at them using a text editor, they look good and everything
is cool.
   <?xml version="1.0" encoding="UTF-8"?>
   <sensor>
     <entry tstamp="1113819173814" ....
    ....

However, if you use a hex editor, you would see:
   FF FE 3C 00 3F 00 78 00 6D 00 63 00....

FFFE: (My guess) unicode endian order mark
3C00: <
3F00: ?
7800: x
6D00: m
6300: l

Obviously, the file uses UTF-16 encoding.

The problem is when I use JDOM to parse it:
   Document doc = new SAXBuilder().build(fileName)

It gives exception:
   "Error on line 1: Document root element is missing."

I think JDOM is confused by "FFFE" at the beginning of the file.

Does anybody know how to solve the problem?

Thanks

Cedric

Re: [HACKYSTAT-DEV-L] Can anyone help with JDOM SAX parser and unicode issues

Reply via email to