Hi,
I am not sure if this is the right list, but I thought I start out
where the stack trace points me ;-)
I get a SAXParseExeption when parsing an atom feed from Google Reader:
org.xml.sax.SAXParseException: Illegal: ]]> (position:START_TAG
<category term='user/xyz/state/com.google/fresh'>@5:15061 in
java.io.bufferedrea...@4348a0e8)
at
org.apache.harmony.xml.parsers.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:151)
at com.newsrob.U.parseXMLfromInputStream(U.java:45)
at
com.newsrob.EntriesRetriever.fetchNewEntries(EntriesRetriever.java:299)
at
com.newsrob.SynchronizationService$4.run(SynchronizationService.java:172)
at
com.newsrob.SynchronizationService.doSync(SynchronizationService.java:337)
at
com.newsrob.SynchronizationService.access$0(SynchronizationService.java:86)
at
com.newsrob.SynchronizationService$1.run(SynchronizationService.java:75)
at java.lang.Thread.run(Thread.java:935)
I think the problem originates here (see the last category tag):
<category term="user/xyz/state/com.google/reading-list"
scheme="http://www.google.com/reader/" label="reading-list"/><category
term="user/xyz/state/com.google/fresh"
scheme="http://www.google.com/reader/" label="fresh"/><category
term="<![CDATA[ Agenda ]]>"/>
Any idea why this happens?
This is the (abbreviated) code I use to parse the stream from Google.
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setCoalescing(true); // added this later with no effect
DocumentBuilder builder = dbf.newDocumentBuilder();
BufferedReader br = new BufferedReader(new
InputStreamReader(is,"UTF-8"), 8 * 1024);
builder.parse(new InputSource(br));
???
Maybe Google doesn't generate proper XML? I don't know. I originally
converted my code back to use DOM like above, because I got the same
problem with kXML, but they state in their documentation that it
doesn't support this escaping, I think:
n order to keep kXML as small as possible, no efforts are made to recognize
certain well-formedness errors that would require additional detection code,
such as
- ']]>' contained in text content,
- duplicate attributes, and
- <? folowed by a space before the targe
Cheers,
Mariano