What do you mean with messed up? What you see by an editor or any other form of 
display and is in your code base in your buffer ready to be displayed are 
different pairs of shoes. 

 

1.       You need to look how UTF-8 encodes the German Umlaut(s)!  For that 
Goto Wiki @ http://en.wikipedia.org/wiki/UTF-8 and read until you unde4rstand 
it.

2.       UTF-8 does not have a single byte for an "ä" or "ö" or "ü" but uses 2 
bytes for it. 

3.       How they get presented to you depends on many thing. OS, Application 
representing characters, used lower layers of SW like drivers etc. Selected 
conversions. 

4.       So looking at them even absolute correct encoded could easy give you 
the impression about corruption.

 

[from WIKI for you]

UTF-8 encodes each of the 1,112,064[7] 
<http://en.wikipedia.org/wiki/UTF-8#cite_note-6>  code points 
<http://en.wikipedia.org/wiki/Code_point>  in the Unicode character set using 
one to four 8-bit bytes <http://en.wikipedia.org/wiki/Byte>  (termed "octets 
<http://en.wikipedia.org/wiki/Octet_(computing)> " in the Unicode Standard). 
Code points with lower numerical values (i. e., earlier code positions in the 
Unicode character set, which tend to occur more frequently in practice) are 
encoded using fewer bytes,[8] <http://en.wikipedia.org/wiki/UTF-8#cite_note-7>  
making the encoding scheme reasonably efficient. In particular, the first 128 
characters of the Unicode character set, which correspond one-to-one with ASCII 
<http://en.wikipedia.org/wiki/ASCII> , are encoded using a single octet with 
the same binary value as the corresponding ASCII character, making valid ASCII 
text valid UTF-8-encoded Unicode text as well.

 

Josef

 

 

Von: Iyengar, Kumar [mailto:kumar_iyen...@bmc.com] 
Gesendet: Mittwoch, 18. Mai 2011 08:01
An: axis-u...@ws.apache.org
Betreff: addChild with a node that has German characters

 

Hi all,

 

I am copying one node to another node. The Source node contains a child (Text) 
with German characters. After adding the child to the new node the German 
characters get messed up.

 

The source node contains a string with 'umlaut a' and in the destination this 
character is messed up.

 

I checked the factory from which the destination node is created and it has the 
default char set as "utf-8".

 

Does anyone know why this is happening?

 

Thanks,

 

--kumar

Reply via email to