Re: lynx-dev TRST : awk script

Philip Webb Tue, 9 Nov 1999 03:31:28 -0800
i haven't read the latest messages yet, but here's an awk script
(with sed annexe), which performs the same deletions of <P>, </P> & <BR>
which i did by hand in  www.chass.utoronto.ca/~purslow/trst.html
(there's no <BR> there, but other tests suggest it works);
it does assume  <  &  >  alternate, but otherwise seems robust.
to try it out, run the script on an HTML file of your choice;
disclaimer: awk & sed may perform differently on different systems.
i hear the jackals howling & see the hyenas slavering ...

awk 'BEGIN { FS = "<" ; u=0 }
     $0 == "" {print $0}
     $0 != "" {for (i=1; i<=NF; i++) {
          t=0;
          if ($i ~ /^[Tt][Aa][Bb][Ll][Ee].*>.*/) u=u+1;
          if ($i ~ /^\/[Tt][Aa][Bb][Ll][Ee]/) {u=u-1; if (u<0) u=0};
          if (u>0 && ($i ~ /^[Pp]/ || $i ~ /^\/[Pp]/ || $i ~ /^[Bb][Rr]/ ))
            {o=">"; split($i,a,o); printf "#####"; printf "%s", a[2]; t=1};
          if (t==0) {
            if ($i ~ />/) printf "<";
            printf "%s", $i }}; printf "\n" }
    ' $* |
sed 's/#####//g
'   

[NB the final  ' ]

-- 
========================,,============================================
SUPPORT     ___________//___,  Philip Webb : [EMAIL PROTECTED]
ELECTRIC   /] [] [] [] [] []|  Centre for Urban & Community Studies
TRANSIT    `-O----------O---'  University of Toronto
Re: lynx-dev TRST : awk script

Reply via email to