After some debugging, I found out the following:

- Standard encoding used to serialize is UTF-8
- For the toString() method, everything's serialized to UTF-8, then a string is 
created from the bytes, without specifying an encoding, which in my case, is 
probably iso-latin-1

In my particular case, the pound sign is serialized to -62,-93 which is
11000010 10100100, and the UTF-8 representation for the pound symbol 
(00010100100); in iso-latin-1, however, this bit-sequence represents a special 
A-character and the pound signal (which is, I'd say, coincidental).

I'm using Axiom in combination with Axis2 and this is causing a problem. Can I 
do something about this? I don't know how Axis2 uses Axiom, but the weird 
signals get outputted as well...

Brecht

-----Original Message-----
From: Brecht Yperman [mailto:[EMAIL PROTECTED] 
Sent: dinsdag 10 oktober 2006 13:35
To: [email protected]
Subject: RE: Weird character in XML string

The Java sample code got removed, I'll add it to the body.



import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.Reader;
import java.io.StringBufferInputStream;
import java.io.StringWriter;
import java.io.Writer;

import javax.xml.stream.XMLStreamException;

import org.apache.axiom.om.OMElement;
import org.apache.axiom.om.impl.builder.StAXBuilder;
import org.apache.axiom.om.impl.builder.StAXOMBuilder;

public class TestReadXml {
        public static void main(String[] args) throws IOException,
                        XMLStreamException {
                
                
                String test = readFileToString(new File("bin/result.xml"),
                                "ISO8859_1");
                
                System.out.println(test);

                // XMLStreamReader parser =
                // XMLInputFactory.newInstance().createXMLStreamReader(ir);

                InputStream is = new StringBufferInputStream(test);

                StAXBuilder builder = new StAXOMBuilder(is);
                OMElement documentElement = builder.getDocumentElement();
                System.out.println(documentElement.toString());
        }

        public static String readFileToString(File file, String encoding)
                        throws IOException {
                InputStream in = new java.io.FileInputStream(file);
                try {
                        return toString(in, encoding);
                } finally {
                        closeQuietly(in);
                }
        }

        public static String toString(InputStream input, String encoding)
                        throws IOException {
                StringWriter sw = new StringWriter();
                copy(input, sw, encoding);
                return sw.toString();
        }

        public static void copy(InputStream input, Writer output, String 
encoding)
                        throws IOException {
                InputStreamReader in = new InputStreamReader(input, encoding);
                copy(in, output);
        }

        public static int copy(Reader input, Writer output) throws IOException {
                char[] buffer = new char[DEFAULT_BUFFER_SIZE];
                int count = 0;
                int n = 0;
                while (-1 != (n = input.read(buffer))) {
                        output.write(buffer, 0, n);
                        count += n;
                }
                return count;
        }

        private static final int DEFAULT_BUFFER_SIZE = 1024 * 4;

        public static void closeQuietly(InputStream input) {
                if (input == null) {
                        return;
                }

                try {
                        input.close();
                } catch (IOException ioe) {
                }
        }
}

-----Original Message-----
From: Brecht Yperman [mailto:[EMAIL PROTECTED] 
Sent: dinsdag 10 oktober 2006 13:33
To: [email protected]
Subject: Weird character in XML string


Hi,

I'm reading in an XML file with the pound sign in it (0xa3).

When I parse it using the StaxOMBuilder and then print the 
documentElement.toString to the console, a weird character (0xc2) appears in 
front of the pound sign.

What is happening?

Thanks a lot,
Brecht

Invenso - The "Integration Software" specialists.
____________________________________________
Brecht Yperman
Development Team

Direct: +32 (0)3 780 30 05
Email: [EMAIL PROTECTED]
INVENSO bvba
Industriepark-West 75
9100 Sint-Niklaas
Belgium - Europe

Phone: +32 (0)3 780 30 02
Fax: +32 (0)3 780 30 03
Email: [EMAIL PROTECTED] 
Website: www.invenso.com 
VAT BE 0477.834.668
RPR Sint-Niklaas 
"E-mail disclaimer: This e-mail, and any attachments thereto, is intended only 
for use by the addressee(s) named herein and may contain legally privileged 
and/or confidential information. If you are not the intended recipient, please 
note that any review, dissemination, disclosure, alteration, printing, copying 
or transmission of this e-mail and/or any file transmitted with it, is strictly 
prohibited and may be unlawful. If you have received this e-mail by mistake, 
please immediately notify the sender and permanently delete the original as 
well as any copy of any e-mail and any printout thereof."


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to