Here's a quick perl solution that doesn't read everything into
memory and seems to handle some of the edge cases. Try it out on a
few things to verify that everything is okay before completely
trusting it, though. :)
Copy the lines between '------------' into a file (say
strip_xml_comments.pl).
(if on Unix do this step first)
chmod 755 strip_xml_comments.pl
Make a backup copy of any and all files that you'll be using. (The
script should work fine as is, but it's *MUCH* better to be safe than
sorry. :)
Now you should be able to run the script on a copy of your input file.
strip_xml_comments.pl my_xml_input_file.xml
The script will make a backup copy of its own with '.orig' at the
end of the name. (Please don't just rely on this feature -- make your
own backup.)
Verify that everything looks okay and integrate it into your
application stream.
Here's the script
----------------------
#!/usr/bin/perl -w -i.orig
#
# NB: Delete the '.orig' portion if backup copies are not desired
#
#
# Delete XML comments.
#
#
# Go through every file given on the command line
#
$in_comment= 0;
while( <> ) {
#
# Match inline comments
#
s {
<!-- # Match the opening delimiter.
.*? # Match a minimal number of characters.
--> # Match the closing delimiter.
} []gsx;
#
# Match multi-line comments
#
if( /<!--/ ) {
$in_comment= 1;
next;
}
#
# Find the end of a multi-line comment and remove everything to that point.
# NB: All other in-line comments have already been removed
#
if( /-->/ ) {
s/.*-->//;
$in_comment= 0;
}
#
# Ignore every line in the comment
#
if( $in_comment ) {
next;
}
print; # Print everything on the current line
}
----------------------
Note that the code is a simple modification of one of the examples
from the perlre man page (http://perldoc.perl.org/perlre.html).
Hopefully this will suit your purposes!
kells
----- Original Message -----
From: "Paul Moloney" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Thursday, March 29, 2007 6:45 AM
Subject: [docbook-apps] Stripping comments
>
> One task I have it to package our source XML files for use by
> integrators;
> one thing I'd like to do is first strip the comments from these files as
> they may contain sensitive information.
>
> I was thinking that this could be done by processing each file through
> Saxon
> using a stylesheet which strips out comments and outputs the XML again.
> But
> rather than risk reinventing the wheel, I was wondering if anyone out
> there
> has implemented a DocBook comment stripper in their build process?
>
> Thanks,
>
> P.
> --
> View this message in context:
> http://www.nabble.com/Stripping-comments-tf3486783.html#a9734912
> Sent from the docbook apps mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]