-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I'm writing a sed script that will parse the *broken* output of man2html. I say broken, because the output isn't W3C compliant (html OR xhtml). I'd like to be able to modify it so that the final outcome is XHTML 1.0 compliant. I'm running into a problem where the output doesn't close the <p>, <dt>, or <dd> tags. XHTML requires that tags containing text be closed. So the problem I'm having is being able to take note of the starting tag, grab the subsequent paragraph, then insert the closing tag. What I've got /sort of/ works, but still not.
Here's a sample that has been parsed, but not with the <p> modifying elements: <p> Regular expression support is provided by the PCRE library package, which is open source software, written by Philip Hazel, and copyright by the University of Cambridge, England. See <a href="http://www.pcre.org/">http://www.pcre.org/</a> . <p> Nmap can optionally link to the OpenSSL cryptography toolkit, which is available from <a href="http://www.openssl.org/">http://www.openssl.org/</a> . Here's the entire sedscr (sans comments): /^$/{ N /^\n$/d } /^Content-type: text\/html/c\ <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> s%<\(HTML\|P\|HEAD\|TITLE\|BODY\|STRONG\|EM\|H[123456]\|D[DLT]\|T[TDRH]\)>%\L<\1>%g s%<\/\(HTML\|P\|A\|HEAD\|TITLE\|BODY\|STRONG\|EM\|H[123456]\|D[DLT]\|T[TDRH]\)>%\L</\1>%g s%<BR>%<br />%g s%<HR>%<hr />%g s%<[Dd][Ll] [Cc][Oo][Mm][Pp][Aa][Cc][Tt]>%<dl compact="compact">% s%<A HREF\(.*\)>%<a href\1>%g s%<A NAME\(.*\)>%<a name\1>%g /^<[IB]>.*$/{ N s%\(<[IB]>\)\(.*\)\(<\/[IB]>\)\n%\L\1\2\L\3% } /^<[ib]>.*$/{ N s%\n%% } s%<[IB]>%\L&% s%<\/[IB]>%\L&% /<body>/,/<\/body>/{ /<p>/!{ H d } /<p>/{ x s/$/<\/p>/ G } } /^<p>$/,/<\p>$/{ N /^\n<p>$/d } Here's the funkiness after parsing with the last part (/<body>/,/<\/body>/{) enabled: <p> <p> Regular expression support is provided by the PCRE library package, which is open source software, written by Philip Hazel, and copyright by the University of Cambridge, England. See <a href="http://www.pcre.org/">http://www.pcre.org/</a> .</p> <p> <p> Nmap can optionally link to the OpenSSL cryptography toolkit, which is available from <a href="http://www.openssl.org/">http://www.openssl.org/</a> .</p> (Just in case you were wondering, this IS from the nmap man page. ;-) Thanks. - -- gentux echo "hfouvyAdpy/ofu" | perl -pe 's/(.)/chr(ord($1)-1)/ge' gentux's gpg fingerprint ==> 34CE 2E97 40C7 EF6E EC40 9795 2D81 924A 6996 0993 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFDOMBkLYGSSmmWCZMRAnnrAJwKNqr+/OgBdDD8X8PXX6rpKUfaxQCfU9PW Bs2oA/76RYFbbc7DWEpfTM8= =gcc/ -----END PGP SIGNATURE----- -- gentoo-user@gentoo.org mailing list