Hi, I found a "feature" related to the SHA-1 message digest that is stored in XmlDocumentProperties when parsing an InputStream together with the LOAD_STRIP_WHITESPACE option. The digest seems to be calculated over the unstripped XML while producing a stripped XML.
This might be related to usage of DigestInputStream in the method "parse ( InputStream jiois, SchemaType type, XmlOptions options )" in the class SchemaTypeLoaderBase, because the message digest is automatically calculated when read from DigestInputStream, no matter if the read byte is stripped or not afterwards. Other stripping XmlOptions might have this "feature" as well, although I havn't verified it. In the sample below shows the behavoir, Digest 1 and Digest 2 are equal, while Digest 3 differs. As I see it, the result should be to have Digest 2 and 3 equal, differing from Digest 1. String input = "" + "<!DOCTYPE doc [<!ATTLIST e9 attr CDATA \"default\">]>\n" + "<!-- Comment 2 --><doc>\n" + " <e1 />\n" + " <e2 ></e2>\n" + " <e3 name = \"elem3\" id=\"elem3\" />\n" + " <e4 name=\"elem4\" id=\"elem4\" ></e4>\n" + " <e5 a:attr=\"out\" b:attr=\"sorted\" attr2=\"all\" attr=\"I'm\"\n" + " xmlns:b=\"http://www.ietf.org\"\n" + " xmlns:a=\"http://www.w3.org\"\n" + " xmlns=\"http://example.org\"/>\n" + " <e6 xmlns=\"\" xmlns:a=\"http://www.w3.org\">\n" + " <e7 xmlns=\"http://www.ietf.org\">\n" + " <e8 xmlns=\"\" xmlns:a=\"http://www.w3.org\">\n" + " <e9 xmlns=\"\" xmlns:a=\"http://www.ietf.org\"/>\n" + " <text>©</text>\n" + " </e8>\n" + " </e7>\n" + " </e6>\n" + "</doc><!-- Comment 3 -->\n"; // Calculate digest over original message try { MessageDigest md = MessageDigest.getInstance("SHA1"); DigestInputStream in = new DigestInputStream( new ByteArrayInputStream( input.getBytes() ), md); byte[] buffer = new byte[8192]; while (in.read(buffer) != -1) ; byte[] raw = md.digest(); System.out.println( "Digest 1: " + new String( raw ) ); // Digest of original XML, including whitespaces } catch( Exception e ) { e.printStackTrace(); System.exit( -1 ); } // Parse XML with whitespace stripping and message digest options set XmlOptions options = new XmlOptions(); options.setLoadStripWhitespace(); options.setLoadMessageDigest(); XmlObject xo = null; try { xo = XmlObject.Factory.parse( new ByteArrayInputStream( input.getBytes() ), options ); } catch ( XmlException e ) { e.printStackTrace(); System.exit(-1); } catch( IOException e ) { e.printStackTrace(); System.exit(-1); } System.out.println( "Digest 2: " + new String( xo.documentProperties().getMessageDigest() ) ); // Digest of parsed XML // Calculate digest over parsed XML try { MessageDigest md = MessageDigest.getInstance("SHA1"); DigestInputStream in = new DigestInputStream( xo.newInputStream(), md); byte[] buffer = new byte[8192]; while (in.read(buffer) != -1) ; byte[] raw = md.digest(); System.out.println( "Digest 3: " + new String( raw ) ); // Digest of parsed XML, excluding whitespaces } catch( Exception e ) { e.printStackTrace(); System.exit( -1 ); } An obvious workaround is to manually calculate the message digest, after the parsing. However, it is better to have the digest being calculated during the parsing from a performance perspective, since otherwise you have to run over the XML twice. What do you think of this, is this wanted or unwanted behaviour? Cheers >> Sami Mäkelä Heimore Group --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]