From: Francesco del Vecchio <[EMAIL PROTECTED]> > I have a problem with a Regular expression. > I have to delete from a text all HTML tags but not the DIV one > (keeping all the parameters in the tag).
Don't do that! You should use a HTML parser module instead of regexps. Parsing HTML is not as trivial as it may seem. You may like HTML::JFilter (based on HTML::Parser): use HTML::JFilter; $filter = new HTML::JFilter <<'*END*' div: section style *END* $filteredHTML = $filter->doSTRING($enteredHTML); # http://jenda.krynicky.cz/#HTML::JFilter > I've done this: > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > #!/usr/bin/perl use strict; my $test=<<EOS; <html><head><meta > content="MSHTML 6.00.2800.1400" name="GENERATOR"> </head><body><font > face="Courier New" size=2> =========SUPER SAVING========= <br> > -product one <br> -product two <br><D> -product three <br><dIV > section=true> ============================== <Br></DIV> > <br><br></font></body> </html> EOS $test=~s/<br>/\n/ig; > $test=~s/<^[DIV](.*?)>//ig; print $test; > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > with this I can hav ALMOST what I want. I delete all HTML tags but > <DIV> one but I also keep a <D> tag and I delete the </DIV> tag that I > would like to keep > > The problem is in the ^[DIV] part of my regex....the "DIV" string is > used as list of chars and not as whole world. Is there a way to > archieve my goal? Drop the []. [] means group of chars. Also the ^ means something only at the beginning of a regexp or a group. In this case you would have to use a positive look-ahead. Read perldoc perlretut perldoc perlre Jenda ===== [EMAIL PROTECTED] === http://Jenda.Krynicky.cz ===== When it comes to wine, women and song, wizards are allowed to get drunk and croon as much as they like. -- Terry Pratchett in Sourcery -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>