At 11:38 -0400 5/13/06, Alain Paradis wrote: >Could someone please give me some direction on how I would remove all the >HTML mark-up from many documents except for the links (anything with <a >href="foo.html">)
I recently did something like that to remove everything but <table> and its cousins from downloaded html. It uses perl and curl. <http://macnauchtan.com/software/FinpMod/FinpMod.html> tells about it. The perl module is at: <ftp://ftp.macnauchtan.com/Software/FinpMod/FinpMod.pm> sub FMtabler is the part from which you can extract code, mostly strange regular expressions. You could probably make them into a text factory or a script for the #! menu if you aren't downloading them from the web anyway. -- --> If you are presented a number as a percentage, and you do not clearly understand the numerator and the denominator involved, you are surely being lied to. <-- -- ------------------------------------------------------------------ Have a feature request? Not sure the software's working correctly? If so, please send mail to <[EMAIL PROTECTED]>, not to the list. List FAQ: <http://www.barebones.com/support/lists/bbedit_talk.shtml> List archives: <http://www.listsearch.com/BBEditTalk.lasso> To unsubscribe, send mail to: <[EMAIL PROTECTED]>
