[android-developers] Re: SAXParser reports diffeernt qName on SDK 0.9 from SDK 1.0

ukchucktown Tue, 21 Oct 2008 05:21:02 -0700

Did anyone create an issue in the issue tracker? I think this needs to
be fixed. Unless I missed something, the latest version of the Android
SAX parser supports namespaces. In the absence of a namespace prefix,
a compliant parser should return the same value for qname and
localname. The Android parser does not. It returns the correct value
for qname and null or empty string for localname. This bug will break
popular third-party libraries like jdom which would otherwise work
fine on Android.


Grant

On Oct 2, 3:46 pm, Charlie Collins <[EMAIL PROTECTED]> wrote:
> Glad you got it worked out.
>
> Now back to the originally scheduled issue here ;).
>
> I DO think the 1.0 parser is better, more strict, but theqName
> localName thing should still work (localName now having content whenqNamedid 
> on previous versions actually seems like an IMPROVEMENT, but
> still they both should have content when configured as such), and the
> features should be settable - I think.  I wish an Android dev would
> chime in here too though?
>
> On Oct 2, 3:36 pm, Chris Cicc <[EMAIL PROTECTED]> wrote:
>
> > Hey Charlie and Brad,
> > Good news! I now have the ampersand being parsed correctly. However
> > the change needed wasn't what we expected. Every time I changed the
> > raw text in the database from the '&' character to the escaped '&amp;'
> > or '&#038;' it didn't work, it would still break at that first
> > ampersand.
>
> > After doing some more research, I came across 
> > this:http://java.sun.com/j2ee/1.4/docs/tutorial/doc/JAXPSAX3.html
>
> > In that document, it says:
>
> > "Note: To be strictly accurate, the character handler should scan the
> > buffer for ampersand characters (&);and left-angle bracket characters
> > (<) and replace them with the strings &amp; or &lt;, as appropriate.
> > You'll find out more about that kind of processing when we discuss
> > entity references in Displaying Special Characters and CDATA. "
>
> > I was working on implementing that, when I read further about CDATA
> > sections. I decided to try this, by implementing the
> > XmlDocument.CreateCDataSection method instead of the
> > XmlDocument.CreateTextNode method I'm currently using in my .Net web
> > service. Without having to modify my SAXParser code at all it worked
> > with the new CDATA section!
>
> > So what did I learn:
> > 1. The SAXParser does indeed break like this by design.
> > 2. Android beta's apparently did not implement a properly spec'ed
> > SAXParser.
> > 3. The SAXParser may be lightweight, but it comes at the cost of
> > parsing robustness. For instance, the built in .Net parser does not
> > have this issue. It simply reads the node, then everything after the
> > node until it reaches an end node. It's smart enough to detect full
> > nodes on their own, without simply assuming anything after '<' is a
> > new node like SAX does.
>
> > Going forward, I'm going to keep using CDATA sections, and look to
> > replace the parser if needed in the future.
>
> > As a developer, I'm really disappointed a Google rep didn't chime in
> > on this conversation. I'm used to having everything posted at
> > forums.asp.net read by Microsoft devs. But I appreciate all the help
> > of fellow community members like yourselves!
>
> > -chris
>
> > On Oct 2, 7:24 am, Charlie Collins <[EMAIL PROTECTED]> wrote:
>
> > > It's just &#038;, or &amp;
>
> > >http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_re...
>
> > > The & and the ; delimit the entity.
>
> > > But Chris, your XML in your source example there can't have an
> > > ampersand there like that.  You need to be using the escape/encoding.
> > > If everything in the chain supports UTF-8 you can use &amp;, if not,
> > > use the numerical entity version, &#038;.
>
> > > Again, this is a different topic than the differences in the parsing,
> > > but every XML processor I have ever seen will blow up on a non-escaped
> > > ampersand.
>
> > > "Characters (or code points in Unicode terminology) outside the simple
> > > ASCII range 32-127 (&#x20; to &#x7F;) must either be encoded as multi-
> > > byte UTF-8 sequences or using numerical entities. In environments that
> > > do not natively support UTF-8 it is often easier to use numerical
> > > entities"
>
> > > For example, the XML I am using is coming from Google Base - it is
> > > UTF-8, but you STILL have to use the encoding to escape the special
> > > chars:
>
> > > <?xml version='1.0' encoding='UTF-8'?>
> > > <feed xmlns='http://www.w3.org/2005/Atom'xmlns:openSearch='http://
> > > a9.com/-/spec/opensearchrss/1.0/'
> > >         xmlns:gm='http://base.google.com/ns-metadata/1.0'xmlns:g='http://
> > > base.google.com/ns/1.0'
> > >         xmlns:batch='http://schemas.google.com/gdata/batch'>
> > >         <id>http://www.google.com/base/feeds/snippets
> > >         </id>
> > >         <updated>2008-09-29T18:18:13.843Z</updated>
> > >         <title type='text'>Items matching query: ([review
> > >                 type:restaurant][location:Atlanta, GA]) [item type == 
> > > "reviews"]
> > >         </title>
> > >         <link rel='alternate' type='text/html' 
> > > href='http://base.google.com'/
>
> > >         <link 
> > > rel='http://schemas.google.com/g/2005#feed'type='application/
> > > atom+xml'
> > >                 href='http://www.google.com/base/feeds/snippets'/>
> > >         <link 
> > > rel='http://schemas.google.com/g/2005#batch'type='application/
> > > atom+xml'
> > >                 href='http://www.google.com/base/feeds/snippets/batch'/>
> > >         <link rel='self' type='application/atom+xml'
> > >                 
> > > href='http://www.google.com/base/feeds/snippets/-/reviews?start-
> > > index=1&amp;max-results=8&amp;bq=%5Breview+type%3Arestaurant%5D
> > > %5Blocation%3AAtlanta%2C+GA%5D' />
> > >         <link rel='next' type='application/atom+xml'
> > >                 
> > > href='http://www.google.com/base/feeds/snippets/-/reviews?start-
> > > index=9&amp;max-results=8&amp;bq=%5Breview+type%3Arestaurant%5D
> > > %5Blocation%3AAtlanta%2C+GA%5D' />
> > >         <author>
> > >                 <name>Google Inc.</name>
> > >                 <email>[EMAIL PROTECTED]</email>
> > >         </author>
> > >         <generator version='1.0' uri='http://base.google.com'>GoogleBase</
> > > generator>
> > >         <openSearch:totalResults>199</openSearch:totalResults>
> > >         <openSearch:startIndex>1</openSearch:startIndex>
> > >         <openSearch:itemsPerPage>8</openSearch:itemsPerPage>
> > >         <entry>
> > > . . . . .
>
> > > On Oct 1, 7:18 pm, "Brad Gies" <[EMAIL PROTECTED]> wrote:
>
> > > > Charlie,
>
> > > > Yes, I think we are saying ALMOST the same thing. But, I don't think 
> > > > &#038;
> > > > is the Escaped Ampersand. I think it's just the Ampersand, and that's 
> > > > why
> > > > it's causing the problem.
>
> > > > As I say, I'm not a Unicode expert, but I think the proper sequence for 
> > > > an
> > > > escaped ampersand would be : &#038; &#038; I think that's how an escaped
> > > > ampersand would look in UTF-8. The ampersand escaping the ampersand :). 
> > > > Or,
> > > > of course the &amp;
>
> > > > Sorry, I can't try it right now, but I'm interested to know if it works.
> > > > When I have time, I'll build an app to check it.
>
> > > > Sincerely,
>
> > > > Brad Gies
>
> > > > -----------------------------------------------------------------
> > > > Brad Gies
> > > > 27415 Greenfield Rd, # 2,
> > > > Southfield, MI, USA
> > > > 48076www.bgies.com www.truckerphone.comwww.EDI-Easy.com www.pricebunny.com
> > > > -----------------------------------------------------------------
>
> > > > Moderation in everything, including abstinence
>
> > > > -----Original Message-----
> > > > From: android-developers@googlegroups.com
> > > > [mailto:[EMAIL PROTECTED] On Behalf Of Chris Cicc
> > > > Sent: Tuesday, September 30, 2008 10:10 AM
> > > > To: Android Developers
> > > > Subject: [android-developers] Re:SAXParserreports diffeerntqNameon SDK
> > > > 0.9 from SDK 1.0
>
> > > > Hey Brad,
> > > > Just to be sure I tested it out and manually typed in "&amp;" into the
> > > > source for the web service. I didn't expect this to work, because even
> > > > manually typing it in still leads to each character being encoded.
>
> > > > In the quote you provided it says "they MUST be escaped using either
> > > > numeric character references...". UTF-8 (and all unicode) encoding
> > > > does just that :) The '&' is number 38.
>
> > > > On the other hand, I also tested the bracket characters < and >. Both
> > > > cause the same issue as the & character. Other brackets such as [ and
> > > > { and ( do not cause issue.
>
> > > > So clearly this does have something to do with theSAXParserin
> > > > Android handling the special XML characters. I have never used
> > > >SAXParseroutside of Android so I cannot say whether or not it is any
> > > > different. But I can confirm that this did not happen in 0.9 and I am
> > > > 99% confident it should not be happening at all.
>
> > > > Thanks,
> > > > Chris
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google
Groups "Android Developers" group.
To post to this group, send email to android-developers@googlegroups.com
To unsubscribe from this group, send email to
[EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

[android-developers] Re: SAXParser reports diffeernt qName on SDK 0.9 from SDK 1.0

Reply via email to