Erez D <[email protected]> writes:
> the problem is that between <span class="myclass"...> and its </span> there
> may be other <span class="otherclass"...> and its </span>
> that is why i wanted parenthesis matching...
Ah, nesting, I was afraid it would pop up. Try this script
#!/bin/awk -f
BEGIN {
RS="<span[ \\t\\n]+class=\\\"myclass\\\"[^>]*>"
}
NR==1 {print}
NR > 1 {
record=$0
nesting=1
ends=0
while (nesting > 0) {
match(record,/<[/]?span[^>]*>/,tag)
if (tag[0] ~ /^<\/span/)
nesting--
else
nesting++
ends+=(RSTART+RLENGTH)
record=substr(record,RSTART+RLENGTH)
}
print substr($0,ends)
}
I hope extra whitespace here and there is irrelevant for HTML.
--
Oleg Goldshmidt | [email protected]
_______________________________________________
Linux-il mailing list
[email protected]
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il