SAXParseException: Illegal: ]]>

Mariano Kamp Wed, 22 Apr 2009 04:53:05 -0700

Hi,

  I am not sure if this is the right list, but I thought I start out
where the stack trace points me ;-)


  I get a SAXParseExeption when parsing an atom feed from Google Reader:

org.xml.sax.SAXParseException: Illegal: ]]> (position:START_TAG
<category term='user/xyz/state/com.google/fresh'>@5:15061 in
java.io.bufferedrea...@4348a0e8)
        at 
org.apache.harmony.xml.parsers.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:151)
        at com.newsrob.U.parseXMLfromInputStream(U.java:45)
        at 
com.newsrob.EntriesRetriever.fetchNewEntries(EntriesRetriever.java:299)
        at 
com.newsrob.SynchronizationService$4.run(SynchronizationService.java:172)
        at 
com.newsrob.SynchronizationService.doSync(SynchronizationService.java:337)
        at 
com.newsrob.SynchronizationService.access$0(SynchronizationService.java:86)
        at 
com.newsrob.SynchronizationService$1.run(SynchronizationService.java:75)
        at java.lang.Thread.run(Thread.java:935)

  I think the problem originates here (see the last category tag):

<category term="user/xyz/state/com.google/reading-list"
scheme="http://www.google.com/reader/"; label="reading-list"/><category
term="user/xyz/state/com.google/fresh"
scheme="http://www.google.com/reader/"; label="fresh"/><category
term="&lt;![CDATA[ Agenda ]]&gt;"/>

  Any idea why this happens?

  This is the (abbreviated) code I use to parse the stream from Google.

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setCoalescing(true); // added this later with no effect
                
DocumentBuilder builder = dbf.newDocumentBuilder();

BufferedReader br = new BufferedReader(new
InputStreamReader(is,"UTF-8"), 8 * 1024);
builder.parse(new InputSource(br));

  ???

  Maybe Google doesn't generate proper XML? I don't know. I originally
converted my code back to use DOM like above, because I got the same
problem with kXML, but they state in their documentation that it
doesn't support this escaping, I think:

  n order to keep kXML as small as possible, no efforts are made to recognize
certain well-formedness errors that would require additional detection code,
such as
   - ']]>' contained in text content,
   - duplicate attributes, and
   - <? folowed by a space before the targe


Cheers,
Mariano

SAXParseException: Illegal: ]]>

Reply via email to