Re: [docbook-apps] Stripping comments

Bob Stayton Fri, 30 Mar 2007 08:45:50 -0800

You could use XSLT, but you might not like the results.  8^)
You start with an identity stylesheet such as the following:


<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
               version="1.0">

<xsl:output indent="no"/>

<xsl:template match="node()|@*">
   <xsl:copy>
     <xsl:apply-templates select="@*"/>
     <xsl:apply-templates/>
  </xsl:copy>
</xsl:template>

</xsl:stylesheet>

Then you add a template to strip out comments:

<xsl:template match="comment()"/>

There are several problems with using XSLT though:

1. Entity references are expanded, not preserved as entity references.You can't hide them in XSLT because the parser expands them before thestylesheet sees them.

2. Any DOCTYPE declaration is removed. You have to copy your doctypepublic and system identifiers to the xsl:output element's doctype-publicand doctype-system attributes. The stylesheet can't do it, because theDOCTYPE is not accessible to XPath. Any internal DTD subset is lost, asthere is no way for xsl:output to specify it.

3. Default DocBook attributes are added. You will end up with a lot ofmoreinfo="none" attributes on elements like literal.

4. The output will differ in other ways because the XML is parsed and thenre-serialized: attribute order may be different, empty elements may beexpressed differently, character references will become native UTF-8(unless you specify a different output encoding). These differences willshow up in a text diff program, but not an XML-aware differencing program.

Generally, I use Perl for such filtering. The XML comment string is a welldefined regular expression, and Perl doesn't mess with any XML stuff. Iread the entire file into a single string, globally replace comments withnothing, and then print the string.


Bob Stayton
Sagehill Enterprises
DocBook Consulting
[EMAIL PROTECTED]

----- Original Message -----From: "Paul Moloney" <[EMAIL PROTECTED]>

To: <[email protected]>
Sent: Thursday, March 29, 2007 6:45 AM
Subject: [docbook-apps] Stripping comments

One task I have it to package our source XML files for use byintegrators;
one thing I'd like to do is first strip the comments from these files as
they may contain sensitive information.
I was thinking that this could be done by processing each file throughSaxonusing a stylesheet which strips out comments and outputs the XML again.Butrather than risk reinventing the wheel, I was wondering if anyone outthere
has implemented a DocBook comment stripper in their build process?

Thanks,

P.
--
View this message in context:http://www.nabble.com/Stripping-comments-tf3486783.html#a9734912
Sent from the docbook apps mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [docbook-apps] Stripping comments

Reply via email to