Noticed this on xml-dev today and thought it was somewhat relevant to the dicussion about sitemap validation. In essence; validate once, run many. If done this way you wouldn't even need to validate the sitemap each time Cocoon started up, only if the digest changed...
Peter Hunsberger -----Original Message----- From: Niels Peter Strandberg [mailto:[EMAIL PROTECTED] Sent: Monday, March 10, 2003 12:15 PM To: [EMAIL PROTECTED] Subject: [xml-dev] Let the publisher validate the xml and the make a msg digest Let the publisher validate the xml and the make a msg digest When an xml document is authored, the author can attach a xml schema or dtd reference to it. The receiver of the xml document gets the xml document and validates it against the xml schema or dtd, referenced in the document to verify that the document is valid. The xml document might be used over and over again, without any changes is made to it, and it might even be validated every time. This is a waste of time! Let the author do the validation of the finished xml document. If the xml document is successfully validated against the referenced xml schema or dtd, why should the receiver of the document need to check the document again to se if it is valid, the author has tested it already? My suggestion is that after the document has been validated by the author, an message digest is created, similar to ones used in cryptography, and the digest value is appended to the xml document. All the receiver has to do is run the xml document through the same msg. digest, and compare the results of the 2. If they are equal, nothing in the document has changed since the author made the digest, so no need to validate. So this brings you not only conformation that the document is valid, but also that its content has not changed. This also allows dom builders (if they are changed) to skip the process of verifying that the data it receives from the sax reader is really a xml character, well-formed etc, since that also brings a lot of overhead. Just look at jdom when it builds a jdom document. Example: <?xml version="1.0"?> <Family> <Person> <Name>Fred Flintstone</Name> </Person> <Person> <Name>Vilma Flintstone</Name> </Person> </Family> When I run this through openssl and makes a message digest, with the command: "openssl dgst flintstone.xml" it returns a digest: "b99060bb744edd6aac5193da6957afcb" (the problem with this digest is that white space is also included!) Then we can do something like this: <?xml version="1.0"?> <?digest="b99060bb744edd6aac5193da6957afcb"?> // or whatever!!!! <Family> <Person> <Name>Fred Flintstone</Name> </Person> <Person> <Name>Vilma Flintstone</Name> </Person> </Family> The receiver can then read and remove the digest, and the verify it using the same msg digest using the same command showed before. It could be interesting to do some benchmarking on this. This is just some thoughts! Regards, Niels Peter Strandberg ----------------------------------------------------------------- The xml-dev list is sponsored by XML.org <http://www.xml.org>, an initiative of OASIS <http://www.oasis-open.org> The list archives are at http://lists.xml.org/archives/xml-dev/ To subscribe or unsubscribe from this list use the subscription manager: <http://lists.xml.org/ob/adm.pl>