call.setProperty(Call.CHARACTER_SET_ENCODING, "UTF-16");

On 7/5/06, Davanum Srinivas <[EMAIL PROTECTED]> wrote:
Matt,

Please try setting the CHARACTER_SET_ENCODING in call's properties  to
utf-16 and see if that works.

-- dims

On 7/5/06, Matthew Brown <[EMAIL PROTECTED]> wrote:
> I've tried to add a handler to simply log the messages but it seems to (a 
beginner like) me that the Handler doesn't come into play until after the XML is 
parsed/deserialized.
>
> Just to serve as a confirmation, can anyone comment on how Xerces will 
determine what type of encoding the xml is in? Will it look at the prolog, the 
byte order mark, etc?
>
> Thanks
>
>
> -----Original Message-----
> From: Manuel Mall [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, July 05, 2006 11:24 AM
> To: [email protected]
> Subject: Re: Two questions - BOM in UTF-8, and manually cleaning XML
>
>
> On Wednesday 05 July 2006 23:12, Matthew Brown wrote:
> > Two bytes per char; Etherpeak is showing the second byte as 00.
> >
> Seems you are stuck between a "rock and a hard place" here. The byte
> stream appears to be correctly utf-16 encoded but the xml prolog says
> utf-8. Not sure what to recommend. Fix it at the source is obvious but
> not easily done. You may be able to write a handler that re-encodes the
> byte stream into utf-8 before giving it to the Axis stacks. But how to
> write such an Axis handler and how to hook it correctly into the Axis
> processing chain is outside my area of expertise.
>
> May be someone else can give advice on how to attempt such a thing.
>
> Manuel
> > -----Original Message-----
> > From: Manuel Mall [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, July 05, 2006 11:09 AM
> > To: [email protected]
> > Subject: Re: Two questions - BOM in UTF-8, and manually cleaning XML
> >
> > On Wednesday 05 July 2006 23:04, Matthew Brown wrote:
> > > Manuel,
> > >
> > > I believe you hit the problem on the head - the response prolog
> > > says utf-8 but (according to Etherpeak) the BOM is ff/ef.
> > > Coincidentally, by the time the response XML gets logged by axis,
> > > these initial characters are logged as ef bf bd ef bf bd.
> >
> > Matt,
> >
> > what about the rest of the byte stream when you look at it in
> > Etherpeak. Is it UTF-16 encoded (2 bytes per char) or UTF-8 encoded
> > (1 byte per char for all typical ascii characters)?
> >
> > Manuel
> >
> > > Unfortunately we may be in a bit of a tough place with having the
> > > producer of the XML change it; the customer whose web services we
> > > are consuming doesn't seem to see any issue with this (as they are
> > > fine with their .NET tools).
> > >
> > > If it is the case where we are seeing a UTF-16 BOM but a prolog
> > > that declares UTF-8; is there any way to instruct Axis/Xerces to
> > > parse it as UTF-16? Sorry if this question doesn't make much sense,
> > > but I'm not too familiar with how Axis and/or Xerces decide which
> > > character encoding to use when reading the XML.
> > >
> > > Thanks again
> > > Matt
> > >
> > > -----Original Message-----
> > > From: Manuel Mall [mailto:[EMAIL PROTECTED]
> > > Sent: Wednesday, July 05, 2006 10:58 AM
> > > To: [email protected]
> > > Subject: Re: Two questions - BOM in UTF-8, and manually cleaning
> > > XML
> > >
> > > On Wednesday 05 July 2006 22:16, Axel Bock wrote:
> > > > Yes, there is a work-around. It works if you encode the file with
> > > > UTF-8 (for example), and do not include the BOM at the beginning.
> > > > I use notepad++ for that task, where you can save in "UTF-8
> > > > without BOM".
> > > >
> > > > The process for that is easy:
> > > > 1. open the file in notepad++
> > > > 2. mark everything via CTRL-A
> > > > 3. cut (not copy!)
> > > > 4. in the format menu, choose "ANSI" formatting and select "UTF
> > > > without BOM" at the bottom
> > > > 5. paste
> > > > 6. save.
> > > >
> > > > that is a crap workaround, but works for me. for automatically
> > > > generated files ..... I dunno :-)
> > > >
> > > >
> > > > Greetings,
> > > > Axel.
> > > >
> > > >
> > > > On 7/5/06, Matthew Brown < [EMAIL PROTECTED]
> > > > <mailto:[EMAIL PROTECTED]> > wrote:
> > > >
> > > > Hi all,
> > > >
> > > > I hate to do this, but can anyone please help me with either of
> > > > these issues? I've tried to upgrade Xerces to 2.8.0 but to no
> > > > avail.
> > > >
> > > > Is there anything else I could be doing?
> > >
> > > Just wondering if your file in question starts with hex 'ef bb bf'
> > > or 'ff ef' or 'ef ff'. If it is one of the latter two forms I
> > > believe you have an utf-16 encoded file (little endian or big
> > > endian) not utf-8. If it is the 'ef bb bf' sequence then it starts
> > > correctly with the utf-8 encoded unicode code point for BOM U+FEFF.
> > > In all cases xerces should be able to handle it. A problem may
> > > arise if it starts with 'ff ef' but the XML prolog says
> > > encoding="utf-8" as that is a contradiction I believe.
> > >
> > > I know this does not help directly but may help to check if the
> > > problem is with the producer of the XML document or your consumer.
> > >
> > > Manuel
> > >
> > > > What about the possibility of programmatically editing/cleaning
> > > > the response XML before it is given to the parser?
> > > >
> > > > Thanks
> > > > Matt
> > > >
> > > > -----Original Message-----
> > > > From: Matthew Brown [mailto: [EMAIL PROTECTED]
> > > > <mailto:[EMAIL PROTECTED]> ]
> > > > Sent: Saturday, July 01, 2006 12:41 PM
> > > > To: [email protected] <mailto:[email protected]>
> > > > Subject: Two questions - BOM in UTF-8, and manually cleaning XML
> > > >
> > > >
> > > > 1. From searching the mailing list archives, I see several
> > > > references to people having problems with Byte Order Mark
> > > > characters appearing before the prolog in their UTF-8 messages.
> > > > However I can't seem to find much of a known resolution to these
> > > > issues. Is there a standard/common workaround for these BOM and
> > > > UTF-8 issues?
> > > >
> > > > 2. If there is no answer to my #1, is there anyway that Axis will
> > > > allow me to pragmatically edit the response XML before it is
> > > > passed to the parser and de-serialized? I've tried adding
> > > > Handlers, but I'm assuming that the Handler comes into the
> > > > picture after the message is parsed, because my Handler is only
> > > > ever seeing the request message, and not the response.
> > > >
> > > > Thanks
> > > > Matt Brown
> > >
> > > -------------------------------------------------------------------
> > >-- To unsubscribe, e-mail: [EMAIL PROTECTED] For
> > > additional commands, e-mail: [EMAIL PROTECTED]
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


--
Davanum Srinivas : http://www.wso2.net (Oxygen for Web Service Developers)



--
Davanum Srinivas : http://www.wso2.net (Oxygen for Web Service Developers)

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to