At 11:38 -0400 5/13/06, Alain Paradis wrote:
>Could someone please give me some direction on how I would remove all  the 
>HTML mark-up from many documents except for the links (anything  with <a 
>href="foo.html">)

I recently did something like that to remove everything but <table> and its 
cousins from downloaded html. It uses perl and curl.

<http://macnauchtan.com/software/FinpMod/FinpMod.html> tells about it.

The perl module is at:
<ftp://ftp.macnauchtan.com/Software/FinpMod/FinpMod.pm>
sub FMtabler is the part from which you can extract code, mostly strange 
regular expressions. You could probably make them into a text factory or a 
script for the #! menu if you aren't downloading them from the web anyway.
-- 

--> If you are presented a number as a percentage, and you do not clearly 
understand the numerator and the denominator involved, you are surely being 
lied to. <--

-- 
------------------------------------------------------------------
Have a feature request? Not sure the software's working correctly?
If so, please send mail to <[EMAIL PROTECTED]>, not to the list.
List FAQ: <http://www.barebones.com/support/lists/bbedit_talk.shtml>
List archives: <http://www.listsearch.com/BBEditTalk.lasso>
To unsubscribe, send mail to:  <[EMAIL PROTECTED]>

Reply via email to