On Thu, Aug 6, 2009 at 9:02 PM, Oleg Goldshmidt <[email protected]> wrote:
> Erez D <[email protected]> writes: > > > hi > > i have an html file with few different instances of: > > <span class="myclass"> > > ... some html, e.g. <B> blah blah <a href=....> </a> </b> > > </span> > > i want to remove theses instances. > > ( the html inside the <span> varies between instances, and there is a non > > constant number of instances) > > i thought of replacing '<[^/]' (i.e. '<' folowed by somthing else then > '/' ) > > with '{' and '</' with '}' and then doing parenthesis matching > > however i need it done automatically in batch. (i can do parenthesis > matching > > in vi. can i do this in sed ?) > > Sed is line-oriented which will make it a bit difficult. > > If I understand you correctly, and you want to remove everything > between "<span" and "span>" including the span tags themselves, *and* > the file does not contain the span tags in comments or string literals > or anything like that, *and* "<span" always has a matching "span>", > then one way to do it would be > > $ awk 'BEGIN {RS="(<span|span>)"} NR%2==1' <filename> > > which will consider either "<span" or "span>" as a record separator > and will print only the odd records (everything between "<span" and > "span>" will be even records and will be skipped). > > All you need to know about awk is that it splits the input into > records, RS is the record separator (set to a regexp in the > beginning), and NR is the number of the current record. It prints the > records matching the "odd NR" condition. > > Does this do what you want? > the problem is that between <span class="myclass"...> and its </span> there may be other <span class="otherclass"...> and its </span> that is why i wanted parenthesis matching... thanks, erez > > -- > Oleg Goldshmidt | [email protected] >
_______________________________________________ Linux-il mailing list [email protected] http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
