Great! I hope this initial XML support in Derby will lead to other XML enhancements as well. SQL-XML is a large specification with many other features like publish functions, XMLConcat, mapping of SQL data into XML, schema validation, XMLCast, XMLQuery and XMLTable etc.

I have briefly looked at the proposed changes. Some initial comments:
  1. I suspect the logic to reject XML columns at the toplevel is more complicated than is needed. Would looking at the list of ResultColumns at a toplevel cursor node be sufficient? A check in ReadCursorNode for ResultColumns of XML type might be sufficient. We already have a similar check for '?' (parameters). You can't have a parameter at a toplevel SELECT result column. Derby allows 'select * from t where i=?', but not 'select i, ? from t where i=?'. Seems similar to XML restriction. Take a look at rejectParameters() method. If this logic can be used, it should be possible to simply changes in sqlgrammar.jj, SelectNode, ResultColumn, RowResultSetNode, ResultColumnList etc.
  2. Is it possible to consolidate some of the new nodes into existing nodes or with each other? Like Dan mentioned recently, adding new classes increase Derby footprint and it may be possible to consolidate some of these new nodes.
  3. How do you query XML documents with namespace tags? Also, I think you have to turn on namespace processing tags for Xalan to get the correct results and I didn't notice setting those flags. This may not be correctly functioning.
  4. You have mentioned XMLConstant is only used for null values. It is possible, using some subquerries, the current compilation evaluates XMLParse() into an XMLConstant before further evaluating it. In these cases, it is possible to have a valid XML constant that is not null.
Satheesh

Army wrote:
Please find attached the patch for adding initial XML support to Derby.  While the patch _is_ over 10k lines, note that most of that comes from two XML files that are used in testing.

Comments/details of the patch are included below.  Quoted text is pasted from my initial description of the XML support I added, which can be found here:

http://article.gmane.org/gmane.comp.apache.db.derby.devel/3602

----------------------
-- Feature Description.
----------------------

> When creating the XML datatype, I have done so in such a way as to make
> it possible to re-work the XML store to something smarter in the
> future--this textual representation is just an easy "first step" to get
> things rolling.

I've organized the code so that there's a separation between the XML datatype and its "type implementation", where a "type implementation" defines how a particular XML value is read/written/processed.  Right now, the only type implementation I've written is a UTF8-based one that stores/reads XML just like other Derby string types.  More on that below.

There are three primary classes that make up the full XML datatype picture:

1) org.apache.derby.iapi.types.XMLDataValue

An interface defining the minimal methods that every XML data value should support.  The methods on this interface correlate to the XML operations that I've added--namely, XMLPARSE, XMLSERIALIZE, and XMLEXISTS.

2) org.apache.derby.iapi.types.XML

The XML datatype.  This class implements both the XMLDataValue and the DataType interfaces.  For all DataType operations that are common to every XML implementation, this XML class does the work.  For DataType operations that depend on the particular "XML type implementation" (see below) being used, this XML class simply wraps another class that handles implementation-specific operations.

3) org.apache.derby.impl.sql.xml.XMLImpl

This is the base class for what I call "XML type implementations" (let's call it "XTI" in this email, to save me the effort of typing it).  An XML type implementation (XTI) determines how an XML data value is to be written/read to/from disk, queried, and stored in memory.  The XMLImpl class defines the methods that every XTI (whether UTF8-based or something smarter) must implement.  This class is wrapped by the XML class (#2) above and is used to handle any DataType calls that depend directly on the XTI in use.

> The on-disk format that I'm using is a simple textual representation of
> XML.  In other words, an XML document on disk is really just stored as a
> UTF-8 character string (similar to other JDBC string types).

I have created a UTF8-based XTI with the class

org.apache.derby.impl.sql.xml.XML_UTF8Impl

which extends XMLImpl.  This class takes the "easy way out" and just wraps XML data as an instance of SQLChar.  It reads/writes data in UTF-8, just like other Derby string types.  It uses the Xerces parser to parse XML data and to check well-formedness, and it uses the XSLT processor from Xalan to query.

This UTF8-based implementation is, of course, far from ideal.  The fact that we store XML data on disk as a string means that we have to re-parse it every time we want to query it, which has obvious performance issues.  But it was an easy "first step" for XML and I hope that future development can replace this with something smarter and faster.

In order to add a new XTI, one simply needs to create a class that extends "XMLImpl", implement all of the abstract methods, and then add some logic in two methods defined on the XMLImpl class.  The comments in that file describe what those methods are what the logic should be.

Note that the APIs used for XML processing are included in JDBC 3.0, and thus are inherently available from the 1.4.1 JVMs.  In addition, the Xerces parser that we use is loaded dynamically at run time, which means that the codeline WILL build even if Xerces doesn't exist in the classpath.  That said, though, since I use the Xerces parser, anyone who wishes to _use_ XML in Derby will have to put Xerces in his/her classpath--this is something we may want to revisit at a later date.  Nonetheless, if a user does NOT want to use XML, s/he does NOT have to have Xerces in his/her classpath--that's another benefit of loading Xerces dynamically: a user who uses Derby for "normal", non-XML reasons is not required to have any additional jars in his/her classpath.

> All of the XML functionality that I've written for Derby is based on the
> first (ISO approved) and second (still in development) editions of the
> SQL/XML specification.

This is still true, and as mentioned in some earlier posts, this means that the *** XML syntax we use is apt to change *** (esp for the XMLEXISTS operator). Anyone using XML in Derby should be aware of this fact.

> A. Created an XML type that can be both transient (SQL/XML[2003] X010)
> and persistent (SQL/XML[2003] X016).

Completed as described in my initial email.  Ex:

ij> CREATE TABLE xTable (i INT PRIMARY KEY, x XML);
0 rows inserted/updated/deleted

> B. Created an XMLPARSE function to parse XML (SQL/XML Feature X061).

Completed as described in my initial email, with one exception.  In my initial email, I mentioned that it was up to Xerces to do schema validation at parse time.  Since then, I realized that the SQL/XML[2003] spec explicitly states that XMLPARSE should NOT validate a document.  Thus, while XMLPARSE _will_ check the well-formedness of the document and _will_ parse any associated DTDs to load defaults and/or other DTD-related info, it will _not_ perform validation against the DTD, nor will it validate against an XML Schema Document.

Syntax is as follows:

XMLPARSE( DOCUMENT <string-value-_expression_> PRESERVE WHITESPACE )

Ex:

ij> INSERT INTO xTable VALUES (1, XMLPARSE(DOCUMENT '<simp> doc </simp>' PRESERVE WHITESPACE));
1 row inserted/updated/deleted

> C. Created an XMLSERIALIZE function to serialize an XML value into a
> string (SQL/XML[2003] Feature X071).

Completed as described in my initial email.  The syntax is:

XMLSERIALIZE( <xml-value-_expression_> AS <string-data-type> )

Ex:

ij> SELECT i, XMLSERIALIZE(x AS CHAR(20)) FROM xTable;
I          |2
--------------------------------
1          |<simp> doc </simp>

1 row selected

> D. Created an XMLEXISTS function for simple querying of XML values
> (SQL/XML[2004] Feature X096).

Completed as described in my initial email.  The syntax is:

XMLEXISTS( <xpath-_expression_> PASSING BY VALUE <xml-value-_expression_> )

Note, though, that this is based on the 2004 working draft of the spec, and thus ** is susceptible to change ** in the future.

Ex:

ij> SELECT i FROM xTable where XMLEXISTS('/simp' PASSING BY VALUE x);
I
-----------
1

1 row selected

The details of all of these changes are included in the comments for the files.  I think I've done a pretty thorough job of commenting, but people should let me know if they'd like more in any particular area.

----------------------
-- Known issue.
----------------------

In my initial email, I mentioned that I was going to disallow binding to/from an XML parameter.  While I have this working for embedded mode, I still need to figure out how to enforce this in server mode.  Since the setXXX methods are implemented by the client, we need to look for XML parameters at statement preparation time and throw compile-time errors.  I was looking at this for a while yesterday and, oddly enough, couldn't nail it down--but hopefully I'm just missing something small.  Since that's the only issue that I know of with this patch, I thought I'd send it out and let people start reviewing it while I look at the binding problem.  As a result, anyone who uses the attached patch and then tries to bind a parameter to an XML value over the server is going to have problems.  But since the goal is to disallow that behavior altogether (in a graceful manner, of course), hopefully people can just avoid doing that until I have a fix...

----------------------
-- Patch details.
----------------------

Since a built-in datatype tends to affect many areas, the patch modifies a good number of files--but note that the changes to most of those files are pretty minor.

The total patch is over 10,000 lines, but more than half of that is the result of two 40k XML documents that I've added for the sake of testing.  And most of the rest is from new files--so no, that's not 10,000 lines of code changes ;)

I created two new directories.  This means that, since the "patch" command can't create directories on its own (at least, not the patch command I use), you may need to create the directories manually BEFORE applying the patch.  The new directories are:

java/engine/org/apache/derby/impl/sql/xml
java/testing/org/apache/derbyTesting/functionTests/tests/lang/xmlTestFiles

The first directory holds the "XML Type implementation" classes mentioned above, along with a build.xml file that is needed so that the XTIs are only built using JDK 1.4.  The required XML APIs aren't in JDK 1.3 or prior, so Derby will not support XML for 1.3.

The second directory holds a bunch of files used for XML testing.

The results from an "svn stat" are attached to this email along with the patch.

I ran the "derbylang" suite with Sun JDK 1.4.2 on Windows and all of the tests passed.  I haven't had a chance to run the full "derbyall" suite yet, but plan to do that tonight. Yes, I realize that's very important, and I certainly plan to do it ASAP--but I thought it'd be good to get the patch out and have people start looking at it.  If there are any failures in "derbyall" when I run it locally tonight, I will address them tomorrow.

Feedback is appreciated,
Army

M tools\jar\DBMSnodes.properties M java\engine\org\apache\derby\impl\sql\compile\NodeFactoryImpl.java A java\engine\org\apache\derby\impl\sql\compile\XMLSerializeOperatorNode.java A java\engine\org\apache\derby\impl\sql\compile\XMLConstantNode.java M java\engine\org\apache\derby\impl\sql\compile\SelectNode.java M java\engine\org\apache\derby\impl\sql\compile\QueryTreeNode.java M java\engine\org\apache\derby\impl\sql\compile\ResultColumn.java M java\engine\org\apache\derby\impl\sql\compile\C_NodeNames.java A java\engine\org\apache\derby\impl\sql\compile\XMLExistsOperatorNode.java A java\engine\org\apache\derby\impl\sql\compile\XMLParseOperatorNode.java M java\engine\org\apache\derby\impl\sql\compile\TypeCompilerFactoryImpl.java M java\engine\org\apache\derby\impl\sql\compile\RowResultSetNode.java M java\engine\org\apache\derby\impl\sql\compile\sqlgrammar.jj M java\engine\org\apache\derby\impl\sql\compile\DB2LengthOperatorNode.java M java\engine\org\apache\derby\impl\sql\compile\CharTypeCompiler.java M java\engine\org\apache\derby\impl\sql\compile\UnaryOperatorNode.java M java\engine\org\apache\derby\impl\sql\compile\ResultColumnList.java A java\engine\org\apache\derby\impl\sql\compile\XMLTypeCompiler.java M java\engine\org\apache\derby\impl\sql\build.xml A java\engine\org\apache\derby\impl\sql\xml A java\engine\org\apache\derby\impl\sql\xml\XMLImpl.java A java\engine\org\apache\derby\impl\sql\xml\XML_UTF8Impl.java A java\engine\org\apache\derby\impl\sql\xml\build.xml M java\engine\org\apache\derby\impl\sql\catalog\DataDictionaryImpl.java M java\engine\org\apache\derby\impl\jdbc\Util.java M java\engine\org\apache\derby\iapi\sql\compile\C_NodeTypes.java M java\engine\org\apache\derby\iapi\services\build.xml M java\engine\org\apache\derby\iapi\services\io\RegisteredFormatIds.java M java\engine\org\apache\derby\iapi\services\io\StoredFormatIds.java A java\engine\org\apache\derby\iapi\types\XML.java M java\engine\org\apache\derby\iapi\types\DataTypeUtilities.java M java\engine\org\apache\derby\iapi\types\build.xml M java\engine\org\apache\derby\iapi\types\SQLChar.java M java\engine\org\apache\derby\iapi\types\TypeId.java M java\engine\org\apache\derby\iapi\types\DataValueFactoryImpl.java A java\engine\org\apache\derby\iapi\types\XMLDataValue.java M java\engine\org\apache\derby\iapi\types\DTSClassInfo.java M java\engine\org\apache\derby\iapi\types\StringDataValue.java M java\engine\org\apache\derby\iapi\types\DataValueFactory.java M java\engine\org\apache\derby\iapi\reference\SQLState.java M java\engine\org\apache\derby\iapi\reference\ClassName.java M java\engine\org\apache\derby\catalog\types\TypesImplInstanceGetter.java M java\engine\org\apache\derby\catalog\types\BaseTypeIdImpl.java M java\engine\org\apache\derby\loc\messages_en.properties A java\testing\org\apache\derbyTesting\functionTests\tests\lang\xmlBinding.java M java\testing\org\apache\derbyTesting\functionTests\tests\lang\copyfiles.ant A java\testing\org\apache\derbyTesting\functionTests\tests\lang\xmlTestFiles A java\testing\org\apache\derbyTesting\functionTests\tests\lang\xmlTestFiles\dtdDoc.xml A java\testing\org\apache\derbyTesting\functionTests\tests\lang\xmlTestFiles\personal.xsd A java\testing\org\apache\derbyTesting\functionTests\tests\lang\xmlTestFiles\xsdDoc.xml A java\testing\org\apache\derbyTesting\functionTests\tests\lang\xmlTestFiles\dtdDoc_invalid.xml A java\testing\org\apache\derbyTesting\functionTests\tests\lang\xmlTestFiles\wide40k.xml A java\testing\org\apache\derbyTesting\functionTests\tests\lang\xmlTestFiles\xsdDoc_invalid.xml A java\testing\org\apache\derbyTesting\functionTests\tests\lang\xmlTestFiles\deep40k.xml A java\testing\org\apache\derbyTesting\functionTests\tests\lang\xmlTestFiles\personal.dtd A java\testing\org\apache\derbyTesting\functionTests\tests\lang\xmlBinding_app.properties A java\testing\org\apache\derbyTesting\functionTests\tests\lang\xml_general.sql A java\testing\org\apache\derbyTesting\functionTests\master\DerbyNet\xml_general.out A java\testing\org\apache\derbyTesting\functionTests\master\xml_general.out A java\testing\org\apache\derbyTesting\functionTests\master\DerbyNetClient\xml_general.out A java\testing\org\apache\derbyTesting\functionTests\master\xmlBinding.out M java\testing\org\apache\derbyTesting\functionTests\suites\derbylang.runall M java\testing\org\apache\derbyTesting\functionTests\suites\derbynetmats.runall

Index: tools/jar/DBMSnodes.properties =================================================================== --- tools/jar/DBMSnodes.properties (revision 178406) +++ tools/jar/DBMSnodes.properties (working copy) @@ -114,3 +114,7 @@ derby.module.cloudscapenodes.ge=org.apache.derby.impl.sql.compile.SavepointNode derby.module.cloudscapenodes.gf=org.apache.derby.impl.sql.compile.IntersectOrExceptNode derby.module.cloudscapenodes.gg=org.apache.derby.impl.sql.compile.UnaryDateTimestampOperatorNode +derby.module.cloudscapenodes.gh=org.apache.derby.impl.sql.compile.XMLConstantNode +derby.module.cloudscapenodes.gi=org.apache.derby.impl.sql.compile.XMLParseOperatorNode +derby.module.cloudscapenodes.gj=org.apache.derby.impl.sql.compile.XMLSerializeOperatorNode +derby.module.cloudscapenodes.gk=org.apache.derby.impl.sql.compile.XMLExistsOperatorNode Index: java/engine/org/apache/derby/impl/sql/compile/NodeFactoryImpl.java



Reply via email to