Bugs item #1077487, was opened at 2004-12-02 12:04 Message generated for change (Comment added) made by maartenc You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=116035&aid=1077487&group_id=16035
Category: None Group: None >Status: Closed >Resolution: Invalid Priority: 5 Submitted By: Nobody/Anonymous (nobody) >Assigned to: Maarten Coene (maartenc) Summary: DocumentHelper.parseText error with non ASCII characters Initial Comment: DocumentHelper.parseText cannot correctly parse string has non ASCII characters ---------------------------------------------------------------------- >Comment By: Maarten Coene (maartenc) Date: 2004-12-14 21:39 Message: Logged In: YES user_id=178745 According to the XML spec, the allowed characters are: #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] as you can see: � isn't an allowed character. So it's perfectly OK for an XML parser to reject that character. This is also illustrated if you use Xerces instead of Crimson: you'll get this error: org.dom4j.DocumentException: Error on line 1 of document : Character reference "�" is an invalid XML character. Nested exception: Character reference "�" is an invalid XML character. at org.dom4j.io.SAXReader.read(SAXReader.java:433) at ... regards, Maarten ---------------------------------------------------------------------- Comment By: tommess (tangzg) Date: 2004-12-13 02:32 Message: Logged In: YES user_id=1176527 non ASCII characters : � ---------------------------------------------------------------------- Comment By: tommess (tangzg) Date: 2004-12-12 13:57 Message: Logged In: YES user_id=1176527 Example: ----------------------------------------------- Document doc = DocumentHelper.parseText("<testTAG>庙前街 �</testTAG>"); Error output: ---------------------------------------------- org.dom4j.DocumentException: Error on line 1 of document : 非法 XML 字符�; Nested exception: 非法 XML 字符�; at org.dom4j.io.SAXReader.read (SAXReader.java:355) at org.dom4j.io.SAXReader.read (SAXReader.java:271) at org.dom4j.DocumentHelper.parseText (DocumentHelper.java:215) at org.dom4j.test.test.main(test.java:71) Nested exception: org.xml.sax.SAXParseException: 非法 XML 字符�; at org.apache.crimson.parser.Parser2.fatal (Parser2.java:3182) ... ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2004-12-07 12:14 Message: Logged In: NO Could you give an example illustrating the problem? Maarten ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=116035&aid=1077487&group_id=16035 ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ dom4j-dev mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dom4j-dev