I believe the spec describes why best. Unfortunately, I think the behavior you do not want is actually correct.

This came from http://www.w3.org/TR/REC-xml

============================================================================
3.3.3 Attribute-Value Normalization

Before the value of an attribute is passed to the application or checked for validity, the XML processor must normalize the attribute value by applying the algorithm below, or by using some other method such that the value passed to the application is the same as that produced by the algorithm.

1. All line breaks must have been normalized on input to #xA as described in 2.11 End-of-Line Handling, so the rest of this algorithm operates on text normalized in this way.
2. Begin with a normalized value consisting of the empty string.
3. For each character, entity reference, or character reference in the unnormalized attribute value, beginning with the first and continuing to the last, do the following:
* For a character reference, append the referenced character to the normalized value.
* For an entity reference, recursively apply step 3 of this algorithm to the replacement text of the entity.
* For a white space character (#x20, #xD, #xA, #x9), append a space character (#x20) to the normalized value.
* For another character, append the character to the normalized value.


If the attribute type is not CDATA, then the XML processor must further process the normalized attribute value by discarding any leading and trailing space (#x20) characters, and by replacing sequences of space (#x20) characters by a single space (#x20) character.

Note that if the unnormalized attribute value contains a character reference to a white space character other than space (#x20), the normalized value contains the referenced character itself (#xD, #xA or #x9). This contrasts with the case where the unnormalized value contains a white space character (not a reference), which is replaced with a space character (#x20) in the normalized value and also contrasts with the case where the unnormalized value contains an entity reference whose replacement text contains a white space character; being recursively processed, the white space character is replaced with a space character (#x20) in the normalized value.

All attributes for which no declaration has been read should be treated by a non-validating processor as if declared CDATA.

============================================================================

Hope that helps.

Dave


David Thielen wrote:
Hi;
If I have:
<node>line 1
line 2</node>
Then Node.valueOf("/node") will return "line 1\r\nline 2" which is what I want.
But if I have:
<node atr = "line 1
line 2"</node>
Then Node.valueOf (/node/@atr) will return "line 1 line 2" - ie no \r\n. Is there any way to get the \r\n?
thanks - dave


--

+------------------------------------------------------------+
| David Lucas                        mailto:[EMAIL PROTECTED]  |
| Lucas Software Engineering, Inc.   (740) 964-6248 Voice    |
| Unix,Java,C++,CORBA,XML,EJB        (614) 668-4020 Mobile   |
| Middleware,Frameworks              (888) 866-4728 Fax/Msg  |
+------------------------------------------------------------+
| GPS Location:  40.0150 deg Lat,  -82.6378 deg Long         |
| IMHC: "Jesus Christ is the way, the truth, and the life."  |
| IMHC: "I know where I am; I know where I'm going."    <><  |
+------------------------------------------------------------+

Notes: PGP Key Block=http://www.lse.com/~ddlucas/pgpblock.txt
IMHO="in my humble opinion" IMHC="in my humble conviction"
All trademarks above are those of their respective owners.




------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ dom4j-user mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dom4j-user

Reply via email to