I've come across one vexing and one minor issue while trying to debug some castor XML code. Attached is a patch against 0.9.9 that I believe fixes both issues. In patchfile order:

1. AnyNode.getStringValue for ELEMENT nodes acts strangely and at odds with the documentation. In particular, consider the following xml:
   <a>some<b>text</b>is here</a>
With the current code, getStringValue() of the node associated with the <b> block will be "is heretext" (clearly unexpectedly). The current code constructs a very large buffer, then appends all the getStringValue of all later siblings, and only then appends the getStringValue of the children. I cannot imagine ever wanting this behavior, but...

2. I've rewritten the code that merges adjoining TEXT elements for slightly better efficiency (eliminating redundant checks, sizing the buffer correctly at the beginning). Also, I used buf.substring(0) rather than buf.toString() to avoid sharing structure between the original stringbuffer and the new String. This is likely to be a small memory win but a small performance loss, but YMMV.

cheers

--
*Michael Thome*
BBN Technologies 10 Moulton St, Cambridge MA 02138 USA
phone: +1 617 873 1853

--- AnyNode.java.orig	2005-10-20 16:39:31.000000000 -0400
+++ AnyNode.java	2005-10-20 16:42:30.000000000 -0400
@@ -486,14 +486,7 @@
                     return _value;
                 case ELEMENT:
                     StringBuffer result = new StringBuffer(4096);
-                    AnyNode tempNode = this.getNextSibling();
-                    while (tempNode != null &&
-                           tempNode.getNodeType() == TEXT)
-                    {
-                        result.append(tempNode.getStringValue());
-                        tempNode = tempNode.getNextSibling();
-                    }
-                   tempNode = this.getFirstChild();
+                    AnyNode tempNode = this.getFirstChild();
                     while (tempNode != null) {
                         result.append(tempNode.getStringValue());
                         tempNode = tempNode.getNextSibling();
@@ -653,15 +646,25 @@
       * @param node1 the AnyNode that receives the text value
       * @param node2 the AnyNode that needs to be merges with node1.
       */
-     private void mergeTextNode(AnyNode node1, AnyNode node2) {
-         if (node1.getNodeType() != node2.getNodeType())
-            return;
-        if (node1.getNodeType() != AnyNode.TEXT)
-            return;
-        StringBuffer temp = new StringBuffer(node1.getStringValue());
-        temp.append(node2.getStringValue());
-        node1._value = temp.toString();
-        node2 = null;
+      private void mergeTextNode(AnyNode node1, AnyNode node2) {
+       // these type checks are wasteful, since this is a private method used only
+       // once, immediately after such checks.  If you want to retain the checks,
+       // IMHO an exception ought to be thrown rather than NOOP.
+       //if (node1.getNodeType() != node2.getNodeType())
+       //  return;
+       //if (node1.getNodeType() != AnyNode.TEXT)
+       //  return;
+
+       String v1 = node1.getStringValue();
+       String v2 = node2.getStringValue();
+       // size the buffer to exactly the right size
+       StringBuffer temp = new StringBuffer(v1.length()+v2.length());
+       temp.append(v1);
+       temp.append(v2);
+       // toString() will maintain a reference to the buffer, preventing GC of the buffer
+       // this construct avoids internal structure sharing.
+       node1._value = temp.substring(0);
+       // setting node2=null is useless.
      }
 
 }

-------------------------------------------------
If you wish to unsubscribe from this list, please 
send an empty message to the following address:

[EMAIL PROTECTED]
-------------------------------------------------

Reply via email to