[android-developers] Re: Help needed with parsing some XML data !

Bob Kerns Thu, 04 Feb 2010 04:55:13 -0800

The problem is, I can't reproduce your problem.

I don't know why you don't know how to see a stack trace in Eclipse,
so I'm not quite sure how to tell you how. :=)


* You should be in the Debug perspective.
* Select the Breakpoints view in the Debug perspective.
* Click on the little exclamation point icon on the Breakpoints view
toolbar, and enter Exception, to add a breakpoint on all Exception's
being thrown.
* Debug the program, and do any interactions required to trigger the
problem.
* Stacktrace will appear in the stacktrace window.

If, however, I edit your code to do:
        InputSource in = new InputSource(url.openStream());
        in.setEncoding("utf-8");
        xr.parse(in);

Exception in thread "main"
com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException:
Invalid byte 2 of 2-byte UTF-8 sequence.
        at
com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:
674)
        at
com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:
362)
        at
com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:
1742)
        at
com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.scanLiteral(XMLEntityScanner.java:
1064)
        at
com.sun.org.apache.xerces.internal.impl.XMLScanner.scanAttributeValue(XMLScanner.java:
813)
        at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanAttribute(XMLDocumentFragmentScannerImpl.java:
1539)
        at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanStartElement(XMLDocumentFragmentScannerImpl.java:
1316)
        at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl
$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2747)

        at
com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:
648)
        at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:
510)
        at
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:
807)
        at
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:
737)
        at
com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:
107)
        at
com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:
1205)
        at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl
$JAXPSAXParser.parse(SAXParserImpl.java:522)
        at Foo.main(Foo.java:25)

Which isn't the same error at all.

I think the problem may start earlier in the file, and be detected
here. I concur with Frank about validating your XML separately.

But good luck with googling it. Other SAX parsers, notably including
Python, give the same error, plus lots of people have invalid input,
so you get a huge amount of noise. Looking at the SAX source to see
exactly how you get the error might help.

On Feb 4, 1:12 am, MobDev <developm...@mobilaria.com> wrote:
> hehe,
> my bad I said Netbeans but I actually use Eclipse :P
> Still used to J2ME with NetBeans thats why I mixed up...
> Anyways, I do get the error (exception) which I already posted in one
> of my first posts :
> At line 40, column 23: not well-formed (invalid token)
>
> "They *MUST* be valid printing Unicode
> characters. No random control characters -- for example, ISO-8859-1
> byte values 0-31 (decimal). This will ALWAYS fail -- it is NOT well-
> formed XML. If that's the cause of your exceptions"
> Well I did paste the error-generating xml entry which for example is :
> <Country ID="2" CName="Åland eilanden"/>
>
> "It would help
> if you posted the URL to the XML "
> Unfortunately I am not allowed to do so :( It's a URL which is
> actually in use by our already exisiting software...
>
> Also I noticed my previous post was posted three times ! My apologies
> for that, I actually can't remember pushing the Send button three
> times :P
>
> On 3 feb, 19:02, Bob Kerns <r...@acm.org> wrote:
>
>
>
> > Yeah, that's not what I mean by a test case.
>
> > Seehttp://junit.orgasa starting point.
> > (The Android SDK includes some limited version of JUnit I don't
> > recognize. It's adequate for this purpose, but the full, modern
> > version is better. For non-device testing, you're not restricted to
> > the supplied one.)
>
> > Basically, a test case is code that you can run to *automatically*
> > test some specific aspect of a system. In this case, checking against
> > a known set of XML (so it doesn't change), looking for the known
> > desired result, and reporting failures (expected conditions not
> > matching) and errors (unexpected exceptions thrown).
>
> > This removes the variables from the equation. You're not depending on
> > how things display, either in the UI nor in the log stream and
> > windows. You're not dependent on a human to notice a problem.
>
> > An best of all, you can automate it to always run when you build, so
> > if you break later, you'll find out right away, while you know what
> > you changed. And you can make changes freely, with the security of
> > knowing that you won't have to go through some long test/debug cycle.
>
> > I didn't answer your question about getting a stacktrace earlier,
> > because you said "NetBeans". I'm old enough to remember when NetBeans
> > was the hot new thing -- but too old to remember how to do anything
> > with it. Try using Eclipse and the ADK. It will show you the stack
> > trace in the same way it shows any other stack trace, as if you were
> > debugging locally. (I would expect NetBeans to, as well).
>
> > Or you can catch the exception, and use exception.printStackTrace() to
> > get it into the log (I'm surprised it's not already there).
>
> > This isn't really an Android problem, and it's not necessary to debug
> > it there. If you write your failing test cases, you can debug them on
> > your desktop computer, get them working, and you should be set to go
> > on the device.
>
> > Another thing to realize is that not all character values you can come
> > up with, are legal XML content. They *MUST* be valid printing Unicode
> > characters. No random control characters -- for example, ISO-8859-1
> > byte values 0-31 (decimal). This will ALWAYS fail -- it is NOT well-
> > formed XML. If that's the cause of your exceptions, your two choices
> > would be to fix it on the server (probably by encoding this binary
> > data) or to preprocess the fake-XML into real XML before you feed it
> > to the XML parser.
>
> > \
> > On Feb 3, 6:49 am, MobDev <developm...@mobilaria.com> wrote:
>
> > > well to begin with : thx for the expalanation :D
> > > I was wondering about your statement :
> > > "Try logging to a file. Or better yet, create test cases, and verify
> > > the correct operation of your code via test suite, rather than via log
> > > statements. "
> > > I already tried in a test case, which was to write the incoming data
> > > to a AlertDialog, but the result was that those characters are shwon
> > > on-screen with a rectangle with a ? in it... My idea (and test case)
> > > would be to "stream" a list of countries, and afterwards show this
> > > list onscreen so the user can select one...
> > > Our problem is that the whole system I am using is based on the ISO
> > > norm and cannot be changed to UTF-8 in a short period of time...
> > > Or am I misinterpreting your "test cases" and "test suite" ? And if so
> > > how should it have been interpreted ?
>
> > > On 3 feb, 13:26, Bob Kerns <r...@acm.org> wrote:
>
> > > > Well, you found one way to get the encoding in there. A few more:
>
> > > > InputSource.setEncoding("iso-8859-1")
> > > > new InputStreamReader(stream, "iso-8859-1");
>
> > > > I'd argue that it should have gotten it from the ?<xml...
> > > > encoding="iso-8859-1"?> -- I'm a bit surprised it didn't. But it's
> > > > something I'd never rely on if I know the encoding.
>
> > > > Anyway, re: your problem below. It's probably working right, up to the
> > > > point of the log statement.
>
> > > > The log stream is probably taking those bytes, and then later they're
> > > > being interpreted as UTF-8. or it's taking the characters from the
> > > > string, and interpreting them as UTF-8 (via String.getBytes()) and
> > > > passing them off to a log stream that doesn't know about UTF-8.
>
> > > > Try logging to a file. Or better yet, create test cases, and verify
> > > > the correct operation of your code via test suite, rather than via log
> > > > statements.
>
> > > > But if you have any control or influence over the server -- fix the
> > > > problem there. ISO-8859-* should be of purely historical interest in
> > > > interpreting old documents. The first draft of ISO-10646 came out
> > > > nearly 20 years ago, and UTF-8 has been around for nearly 18 years.
> > > > The world is international. It's time to put a stake in the heart of
> > > > these national encodings.
>
> > > > On Feb 3, 2:55 am, MobDev <developm...@mobilaria.com> wrote:
>
> > > > > Btw I also have tried this instead :
>
> > > > >  try {
> > > > >                 URL url = new 
> > > > > URL("http://www.myserver.com/xmlstream";);
> > > > >                 URLConnection conn = url.openConnection();
> > > > >             InputStream is = conn.getInputStream();
> > > > >             Xml.parse(is, Xml.Encoding.ISO_8859_1, new ExampleHandler
> > > > > ());
> > > > >         } catch (Exception e)
> > > > >         {
> > > > >             throw new RuntimeException(e);
> > > > >         }
>
> > > > > This time it wont just crash but for example if I print the output
> > > > > (through Log) I get
>
> > > > > Found attribute : Ã…land eilanden
> > > > > and
> > > > > Found attribute : AlbaniÃ« instead of Albanié
>
> > > > > So any input on this ?
>
> > > > > On 3 feb, 10:44, MobDev <developm...@mobilaria.com> wrote:
>
> > > > > > Actually this is the code for the second approach :
>
> > > > > > /* Create a URL we want to load some xml-data from. */
> > > > > > URL url = new URL("http://www.myserver.com/xmlstream";);
>
> > > > > > /* Get a SAXParser from the SAXPArserFactory. */
> > > > > > SAXParserFactory spf = SAXParserFactory.newInstance();
> > > > > > SAXParser sp = spf.newSAXParser();
>
> > > > > >  /* Get the XMLReader of the SAXParser we created. */
> > > > > > XMLReader xr = sp.getXMLReader();
> > > > > >  /* Create a new ContentHandler and apply it to the XML-Reader*/
> > > > > > ExampleHandler myExampleHandler = new ExampleHandler();
> > > > > > xr.setContentHandler(myExampleHandler);
>
> > > > > > /* Parse the xml-data from our URL. */
> > > > > > xr.parse(new InputSource(url.openStream()));
> > > > > > /* Parsing has finished. */
>
> > > > > > And the error I get is :
> > > > > > At line 40, column 23: not well-formed (invalid token)
>
> > > > > > which is around this XML line :
> > > > > > <Country ID="2" CName="Åland eilanden"/>
>
> > > > > > So where should I specifiy its an ISO-8859-1 ?
> > > > > > Also I have been debugging the app, but I actually cannot see the
> > > > > > stacktrace, could you please direct me on how to show it on 
> > > > > > NetBeans ?
> > > > > > Every time I try to look at the exception thrown I will see several
> > > > > > variables but StackTrace will be null...
>
> > > > > > On 3 feb, 06:17, Bob Kerns <r...@acm.org> wrote:
>
> > > > > > > While I would expect your second approach to work, it's important 
> > > > > > > to
> > > > > > > note that IT IS NOT REQUIRED TO WORK.
>
> > > > > > > The XML standard does not require XML processors to support 
> > > > > > > anything
> > > > > > > other than UTF-8 or UTF-16.
>
> > > > > > > In this day and age, I would STRONGLY discourage use of anything 
> > > > > > > other
> > > > > > > than UTF-8, or, rarely, UTF-16.
>
> > > > > > > Another factor to consider is how you're getting access to those
> > > > > > > characters. You must do this one of two ways:
>
> > > > > > > 1) Using a Reader set to read 8859-1
> > > > > > > -or-
> > > > > > > 2) Using an input stream, giving the raw bytes to the parser, 
> > > > > > > letting
> > > > > > > it decode the 8859-1 characters.
>
> > > > > > > You WILL FAIL (and this is probably your problem, would be my 
> > > > > > > guess)
> > > > > > > if you try to read using a Reader that's expecting UTF-8.
>
> > > > > > > A stacktrace should show which problem you have.
>
> > > > > > > On Feb 2, 6:42 am, MobDev <developm...@mobilaria.com> wrote:
>
> > > > > > > > Hi,
> > > > > > > > I am downloading a xml-type file from a webserver which starts 
> > > > > > > > out
> > > > > > > > with :
>
> > > > > > > > <?xml version="1.0" encoding="iso-8859-1" ?>
>
> > > > > > > > afterwards I get a list with loads of countries, some countries 
> > > > > > > > do
> > > > > > > > contain some letters like é and á.
>
> > > > > > > > I have tried to extract the data of the xml in two ways :
> > > > > > > > 1 - simply download the whole thing into a String, which will 
> > > > > > > > result
> > > > > > > > in those characters being seen as something like [] or on the 
> > > > > > > > Android
> > > > > > > > emulator (and device) I will see a triangle with a ? in it...
>
> > > > > > > > 2 - fetch the list with the SAXParser and XMLReader which will 
> > > > > > > > just
> > > > > > > > throw an exception telling me that there is some content 
> > > > > > > > error...
> > > > > > > > specifically at the line where the first country is with such a
> > > > > > > > character...
>
> ...
>
> read more »

-- 
You received this message because you are subscribed to the Google
Groups "Android Developers" group.
To post to this group, send email to android-developers@googlegroups.com
To unsubscribe from this group, send email to
android-developers+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en

[android-developers] Re: Help needed with parsing some XML data !

Reply via email to