Hey Charlie and Brad, Good news! I now have the ampersand being parsed correctly. However the change needed wasn't what we expected. Every time I changed the raw text in the database from the '&' character to the escaped '&' or '&' it didn't work, it would still break at that first ampersand.
After doing some more research, I came across this: http://java.sun.com/j2ee/1.4/docs/tutorial/doc/JAXPSAX3.html In that document, it says: "Note: To be strictly accurate, the character handler should scan the buffer for ampersand characters (&);and left-angle bracket characters (<) and replace them with the strings & or <, as appropriate. You'll find out more about that kind of processing when we discuss entity references in Displaying Special Characters and CDATA. " I was working on implementing that, when I read further about CDATA sections. I decided to try this, by implementing the XmlDocument.CreateCDataSection method instead of the XmlDocument.CreateTextNode method I'm currently using in my .Net web service. Without having to modify my SAXParser code at all it worked with the new CDATA section! So what did I learn: 1. The SAXParser does indeed break like this by design. 2. Android beta's apparently did not implement a properly spec'ed SAXParser. 3. The SAXParser may be lightweight, but it comes at the cost of parsing robustness. For instance, the built in .Net parser does not have this issue. It simply reads the node, then everything after the node until it reaches an end node. It's smart enough to detect full nodes on their own, without simply assuming anything after '<' is a new node like SAX does. Going forward, I'm going to keep using CDATA sections, and look to replace the parser if needed in the future. As a developer, I'm really disappointed a Google rep didn't chime in on this conversation. I'm used to having everything posted at forums.asp.net read by Microsoft devs. But I appreciate all the help of fellow community members like yourselves! -chris On Oct 2, 7:24 am, Charlie Collins <[EMAIL PROTECTED]> wrote: > It's just &, or & > > http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_re... > > The & and the ; delimit the entity. > > But Chris, your XML in your source example there can't have an > ampersand there like that. You need to be using the escape/encoding. > If everything in the chain supports UTF-8 you can use &, if not, > use the numerical entity version, &. > > Again, this is a different topic than the differences in the parsing, > but every XML processor I have ever seen will blow up on a non-escaped > ampersand. > > "Characters (or code points in Unicode terminology) outside the simple > ASCII range 32-127 (  to ) must either be encoded as multi- > byte UTF-8 sequences or using numerical entities. In environments that > do not natively support UTF-8 it is often easier to use numerical > entities" > > For example, the XML I am using is coming from Google Base - it is > UTF-8, but you STILL have to use the encoding to escape the special > chars: > > <?xml version='1.0' encoding='UTF-8'?> > <feed xmlns='http://www.w3.org/2005/Atom'xmlns:openSearch='http:// > a9.com/-/spec/opensearchrss/1.0/' > xmlns:gm='http://base.google.com/ns-metadata/1.0'xmlns:g='http:// > base.google.com/ns/1.0' > xmlns:batch='http://schemas.google.com/gdata/batch'> > <id>http://www.google.com/base/feeds/snippets > </id> > <updated>2008-09-29T18:18:13.843Z</updated> > <title type='text'>Items matching query: ([review > type:restaurant][location:Atlanta, GA]) [item type == > "reviews"] > </title> > <link rel='alternate' type='text/html' href='http://base.google.com'/ > > <link rel='http://schemas.google.com/g/2005#feed'type='application/ > atom+xml' > href='http://www.google.com/base/feeds/snippets'/> > <link rel='http://schemas.google.com/g/2005#batch'type='application/ > atom+xml' > href='http://www.google.com/base/feeds/snippets/batch'/> > <link rel='self' type='application/atom+xml' > > href='http://www.google.com/base/feeds/snippets/-/reviews?start- > index=1&max-results=8&bq=%5Breview+type%3Arestaurant%5D > %5Blocation%3AAtlanta%2C+GA%5D' /> > <link rel='next' type='application/atom+xml' > > href='http://www.google.com/base/feeds/snippets/-/reviews?start- > index=9&max-results=8&bq=%5Breview+type%3Arestaurant%5D > %5Blocation%3AAtlanta%2C+GA%5D' /> > <author> > <name>Google Inc.</name> > <email>[EMAIL PROTECTED]</email> > </author> > <generator version='1.0' uri='http://base.google.com'>GoogleBase</ > generator> > <openSearch:totalResults>199</openSearch:totalResults> > <openSearch:startIndex>1</openSearch:startIndex> > <openSearch:itemsPerPage>8</openSearch:itemsPerPage> > <entry> > . . . . . > > On Oct 1, 7:18 pm, "Brad Gies" <[EMAIL PROTECTED]> wrote: > > > Charlie, > > > Yes, I think we are saying ALMOST the same thing. But, I don't think & > > is the Escaped Ampersand. I think it's just the Ampersand, and that's why > > it's causing the problem. > > > As I say, I'm not a Unicode expert, but I think the proper sequence for an > > escaped ampersand would be : & & I think that's how an escaped > > ampersand would look in UTF-8. The ampersand escaping the ampersand :). Or, > > of course the & > > > Sorry, I can't try it right now, but I'm interested to know if it works. > > When I have time, I'll build an app to check it. > > > Sincerely, > > > Brad Gies > > > ----------------------------------------------------------------- > > Brad Gies > > 27415 Greenfield Rd, # 2, > > Southfield, MI, USA > > 48076www.bgies.com www.truckerphone.comwww.EDI-Easy.com www.pricebunny.com > > ----------------------------------------------------------------- > > > Moderation in everything, including abstinence > > > -----Original Message----- > > From: android-developers@googlegroups.com > > [mailto:[EMAIL PROTECTED] On Behalf Of Chris Cicc > > Sent: Tuesday, September 30, 2008 10:10 AM > > To: Android Developers > > Subject: [android-developers] Re:SAXParserreports diffeernt qName on SDK > > 0.9 from SDK 1.0 > > > Hey Brad, > > Just to be sure I tested it out and manually typed in "&" into the > > source for the web service. I didn't expect this to work, because even > > manually typing it in still leads to each character being encoded. > > > In the quote you provided it says "they MUST be escaped using either > > numeric character references...". UTF-8 (and all unicode) encoding > > does just that :) The '&' is number 38. > > > On the other hand, I also tested the bracket characters < and >. Both > > cause the same issue as the & character. Other brackets such as [ and > > { and ( do not cause issue. > > > So clearly this does have something to do with theSAXParserin > > Android handling the special XML characters. I have never used > >SAXParseroutside of Android so I cannot say whether or not it is any > > different. But I can confirm that this did not happen in 0.9 and I am > > 99% confident it should not be happening at all. > > > Thanks, > > Chris --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Android Developers" group. To post to this group, send email to android-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/android-developers?hl=en -~----------~----~----~----~------~----~------~--~---