Erez D <[email protected]> writes:

> the problem is that between <span class="myclass"...> and its </span> there
> may be other <span class="otherclass"...> and its </span>
> that is why i wanted parenthesis matching...

Ah, nesting, I was afraid it would pop up. Try this script

#!/bin/awk -f

BEGIN {
        RS="<span[ \\t\\n]+class=\\\"myclass\\\"[^>]*>"
}

NR==1 {print}
NR > 1 {
        record=$0
        nesting=1
        ends=0
        while (nesting > 0) {
                match(record,/<[/]?span[^>]*>/,tag)
                if (tag[0] ~ /^<\/span/)
                        nesting--
                else
                        nesting++
                ends+=(RSTART+RLENGTH)
                record=substr(record,RSTART+RLENGTH)
        }
        print substr($0,ends)
}

I hope extra whitespace here and there is irrelevant for HTML.

-- 
Oleg Goldshmidt | [email protected]

_______________________________________________
Linux-il mailing list
[email protected]
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il

Reply via email to