[android-developers] Re: Help needed with parsing some XML data !

Bob Kerns Wed, 03 Feb 2010 04:26:43 -0800

Well, you found one way to get the encoding in there. A few more:

InputSource.setEncoding("iso-8859-1")
new InputStreamReader(stream, "iso-8859-1");


I'd argue that it should have gotten it from the ?<xml...
encoding="iso-8859-1"?> -- I'm a bit surprised it didn't. But it's
something I'd never rely on if I know the encoding.

Anyway, re: your problem below. It's probably working right, up to the
point of the log statement.

The log stream is probably taking those bytes, and then later they're
being interpreted as UTF-8. or it's taking the characters from the
string, and interpreting them as UTF-8 (via String.getBytes()) and
passing them off to a log stream that doesn't know about UTF-8.

Try logging to a file. Or better yet, create test cases, and verify
the correct operation of your code via test suite, rather than via log
statements.

But if you have any control or influence over the server -- fix the
problem there. ISO-8859-* should be of purely historical interest in
interpreting old documents. The first draft of ISO-10646 came out
nearly 20 years ago, and UTF-8 has been around for nearly 18 years.
The world is international. It's time to put a stake in the heart of
these national encodings.

On Feb 3, 2:55 am, MobDev <[email protected]> wrote:
> Btw I also have tried this instead :
>
>  try {
>                 URL url = new URL("http://www.myserver.com/xmlstream";);
>                 URLConnection conn = url.openConnection();
>             InputStream is = conn.getInputStream();
>             Xml.parse(is, Xml.Encoding.ISO_8859_1, new ExampleHandler
> ());
>         } catch (Exception e)
>         {
>             throw new RuntimeException(e);
>         }
>
> This time it wont just crash but for example if I print the output
> (through Log) I get
>
> Found attribute : Ã…land eilanden
> and
> Found attribute : AlbaniÃ« instead of Albanié
>
> So any input on this ?
>
> On 3 feb, 10:44, MobDev <[email protected]> wrote:
>
>
>
> > Actually this is the code for the second approach :
>
> > /* Create a URL we want to load some xml-data from. */
> > URL url = new URL("http://www.myserver.com/xmlstream";);
>
> > /* Get a SAXParser from the SAXPArserFactory. */
> > SAXParserFactory spf = SAXParserFactory.newInstance();
> > SAXParser sp = spf.newSAXParser();
>
> >  /* Get the XMLReader of the SAXParser we created. */
> > XMLReader xr = sp.getXMLReader();
> >  /* Create a new ContentHandler and apply it to the XML-Reader*/
> > ExampleHandler myExampleHandler = new ExampleHandler();
> > xr.setContentHandler(myExampleHandler);
>
> > /* Parse the xml-data from our URL. */
> > xr.parse(new InputSource(url.openStream()));
> > /* Parsing has finished. */
>
> > And the error I get is :
> > At line 40, column 23: not well-formed (invalid token)
>
> > which is around this XML line :
> > <Country ID="2" CName="Åland eilanden"/>
>
> > So where should I specifiy its an ISO-8859-1 ?
> > Also I have been debugging the app, but I actually cannot see the
> > stacktrace, could you please direct me on how to show it on NetBeans ?
> > Every time I try to look at the exception thrown I will see several
> > variables but StackTrace will be null...
>
> > On 3 feb, 06:17, Bob Kerns <[email protected]> wrote:
>
> > > While I would expect your second approach to work, it's important to
> > > note that IT IS NOT REQUIRED TO WORK.
>
> > > The XML standard does not require XML processors to support anything
> > > other than UTF-8 or UTF-16.
>
> > > In this day and age, I would STRONGLY discourage use of anything other
> > > than UTF-8, or, rarely, UTF-16.
>
> > > Another factor to consider is how you're getting access to those
> > > characters. You must do this one of two ways:
>
> > > 1) Using a Reader set to read 8859-1
> > > -or-
> > > 2) Using an input stream, giving the raw bytes to the parser, letting
> > > it decode the 8859-1 characters.
>
> > > You WILL FAIL (and this is probably your problem, would be my guess)
> > > if you try to read using a Reader that's expecting UTF-8.
>
> > > A stacktrace should show which problem you have.
>
> > > On Feb 2, 6:42 am, MobDev <[email protected]> wrote:
>
> > > > Hi,
> > > > I am downloading a xml-type file from a webserver which starts out
> > > > with :
>
> > > > <?xml version="1.0" encoding="iso-8859-1" ?>
>
> > > > afterwards I get a list with loads of countries, some countries do
> > > > contain some letters like é and á.
>
> > > > I have tried to extract the data of the xml in two ways :
> > > > 1 - simply download the whole thing into a String, which will result
> > > > in those characters being seen as something like [] or on the Android
> > > > emulator (and device) I will see a triangle with a ? in it...
>
> > > > 2 - fetch the list with the SAXParser and XMLReader which will just
> > > > throw an exception telling me that there is some content error...
> > > > specifically at the line where the first country is with such a
> > > > character...
>
> > > > So is there some way to get this to work ? Can I read the iso-8859-1
> > > > encoded xml into the Parser ? Or is there some way to encode/decode
> > > > the received data into something actually usable ?
> > > > Any idea where the problem might be ?
>
> > > > Thanks in advance for any hints, tips, code or explanation :D

-- 
You received this message because you are subscribed to the Google
Groups "Android Developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en

[android-developers] Re: Help needed with parsing some XML data !

Reply via email to