Bugs item #1077487, was opened at 2004-12-02 12:04
Message generated for change (Comment added) made by maartenc
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=116035&aid=1077487&group_id=16035

Category: None
Group: None
>Status: Closed
>Resolution: Invalid
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
>Assigned to: Maarten Coene (maartenc)
Summary: DocumentHelper.parseText error with non ASCII characters

Initial Comment:
DocumentHelper.parseText cannot correctly parse string
has non ASCII characters

----------------------------------------------------------------------

>Comment By: Maarten Coene (maartenc)
Date: 2004-12-14 21:39

Message:
Logged In: YES 
user_id=178745

According to the XML spec, the allowed characters are:

#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
[#x10000-#x10FFFF]

as you can see: � isn't an allowed character. So it's
perfectly OK for an XML parser to reject that character.

This is also illustrated if you use Xerces instead of
Crimson: you'll get this error:

org.dom4j.DocumentException: Error on line 1 of document  :
Character reference "&#0" is an invalid XML character.
Nested exception: Character reference "&#0" is an invalid
XML character.
        at org.dom4j.io.SAXReader.read(SAXReader.java:433)
        at ...

regards,
Maarten

----------------------------------------------------------------------

Comment By: tommess (tangzg)
Date: 2004-12-13 02:32

Message:
Logged In: YES 
user_id=1176527

non ASCII characters  : � 

----------------------------------------------------------------------

Comment By: tommess (tangzg)
Date: 2004-12-12 13:57

Message:
Logged In: YES 
user_id=1176527


Example:
-----------------------------------------------
Document doc = DocumentHelper.parseText("<testTAG>&#24217;&#21069;&#34903;
&#0;</testTAG>");


Error output:
----------------------------------------------
org.dom4j.DocumentException: Error on line 1 of document  : 
&#38750;&#27861; XML &#23383;&#31526;&#x0&#65307; Nested exception: 
&#38750;&#27861; XML &#23383;&#31526;&#x0&#65307;

        at org.dom4j.io.SAXReader.read
(SAXReader.java:355)

        at org.dom4j.io.SAXReader.read
(SAXReader.java:271)

        at org.dom4j.DocumentHelper.parseText
(DocumentHelper.java:215)

        at org.dom4j.test.test.main(test.java:71)

Nested exception: 

org.xml.sax.SAXParseException: &#38750;&#27861; XML &#23383;&#31526;&#x0&#65307;

        at org.apache.crimson.parser.Parser2.fatal
(Parser2.java:3182)
...






----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2004-12-07 12:14

Message:
Logged In: NO 

Could you give an example illustrating the problem?

Maarten

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=116035&aid=1077487&group_id=16035


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
dom4j-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dom4j-dev

Reply via email to