Here's a quick perl solution that doesn't read everything into
memory and seems to handle some of the edge cases.  Try it out on a
few things to verify that everything is okay before completely
trusting it, though. :)

 Copy the lines between '------------' into a file (say
strip_xml_comments.pl).

(if on Unix do this step first)
chmod 755 strip_xml_comments.pl

  Make a backup copy of any and all files that you'll be using.  (The
script should work fine as is, but it's *MUCH* better to be safe than
sorry. :)

  Now you should be able to run the script on a copy of your input file.

strip_xml_comments.pl my_xml_input_file.xml

  The script will make a backup copy of its own with '.orig' at the
end of the name. (Please don't just rely on this feature -- make your
own backup.)

 Verify that everything looks okay and integrate it into your
application stream.

 Here's the script

----------------------
#!/usr/bin/perl -w -i.orig

#
# NB: Delete the '.orig' portion if backup copies are not desired
#

#
# Delete XML comments.
#


#
# Go through every file given on the command line
#
$in_comment= 0;
while( <> ) {

#
# Match inline comments
#
s {
       <!--    # Match the opening delimiter.
       .*?     # Match a minimal number of characters.
       -->     # Match the closing delimiter.
} []gsx;

#
# Match multi-line comments
#
 if( /<!--/ ) {
   $in_comment= 1;
   next;
 }

 #
 # Find the end of a multi-line comment and remove everything to that point.
 # NB: All other in-line comments have already been removed
 #
 if( /-->/ ) {
    s/.*-->//;
   $in_comment= 0;
 }

 #
 # Ignore every line in the comment
 #
 if( $in_comment ) {
    next;
 }

print;  # Print everything on the current line
}

----------------------

  Note that the code is a simple modification of one of the examples
from the perlre man page (http://perldoc.perl.org/perlre.html).


 Hopefully this will suit your purposes!


kells


----- Original Message -----
From: "Paul Moloney" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Thursday, March 29, 2007 6:45 AM
Subject: [docbook-apps] Stripping comments


>
> One task I have it to package our source XML files for use by
> integrators;
> one thing I'd like to do is first strip the comments from these files as
> they may contain sensitive information.
>
> I was thinking that this could be done by processing each file through
> Saxon
> using a stylesheet which strips out comments and outputs the XML again.
> But
> rather than risk reinventing the wheel, I was wondering if anyone out
> there
> has implemented a DocBook comment stripper in their build process?
>
> Thanks,
>
> P.
> --
> View this message in context:
> http://www.nabble.com/Stripping-comments-tf3486783.html#a9734912
> Sent from the docbook apps mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to