[android-developers] Re: SAXParser throws exception for bad character in CDATA block, bug???

Jens Thu, 21 Apr 2011 03:15:34 -0700

No its a bug in your feed.

CDATA is "not parsed" in the sense that characters that otherwise
would be recognized as mark-up are ignored - it's not a carte blanche
to add binary junk/"illegal characters" to XML.


Try passing something like this instead (wrapped in an InputSource) to
the SAXParser#parse method.

class StripReader extends Reader {
                private final Reader mReader;
                public StripReader(Reader reader) {
                        mReader = reader;
                }
                @Override
                public boolean markSupported() {
                        return false;
                }
                @Override
                public void close() throws IOException {
                        mReader.close();
                }
                @Override
                public int read(char[] cbuf, int off, int len) throws 
IOException {
                        int n = 0;
                        int ch = 0;
                        for (int i = 0; i < len; i++) {
                                ch = read();
                                if (ch != -1) {
                                        cbuf[off + n++] = (char) ch;
                                }
                        }
                        return (n == 0 && ch == -1) ? -1 : n;
                }
                @Override
                public int read() throws IOException {
                        int ch;
                        do {
                                ch = mReader.read();
                        } while (ch != -1 && !validChar(ch));
                        return ch;
                }

                private boolean validChar(int ch) {
                        // Char ::=     #x9 | #xA | #xD | [#x20-#xD7FF] | 
[#xE000-#xFFFD] |
[#x10000-#x10FFFF]
/*      any Unicode character, excluding the surrogate blocks, FFFE, and
FFFF. */
                        return ch == 0x9 || ch == 0xA || ch == 0xD || (ch >= 
0x20 && ch <=
0xD7FF) || (ch>=0x10000 && ch <=0x10FFFF);
                }
        }

On 15 Apr, 22:54, Phil Bayfield <p...@bayfmail.com> wrote:
> I'm having an issue with SAXParser on an RSS feed from a vBuletin forum.
>
> The parser throws SAXException - At line 212, column 26: not well-formed
> (invalid token) when it encounters a right apostrophe character 
> -http://www.fileformat.info/info/unicode/char/2019/index.htm
>
> I realise this is a unicode character and the feed is ISO-8859-1, however
> the character falls in a CDATA block, which the parser is supposed to
> ignore.
>
> Anyone encountered this before and know a work around? I've tried things
> like forcing UTF-8 with no luck.
>
> Is this a bug that the parser is not ignoring data in the CDATA block?

-- 
You received this message because you are subscribed to the Google
Groups "Android Developers" group.
To post to this group, send email to android-developers@googlegroups.com
To unsubscribe from this group, send email to
android-developers+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en

[android-developers] Re: SAXParser throws exception for bad character in CDATA block, bug???

Reply via email to