Re: deleting HTML tag...but not everyone

Jenda Krynicky Thu, 29 Jul 2004 06:49:08 -0700

From: Francesco del Vecchio <[EMAIL PROTECTED]>
> I have a problem with a Regular expression.
> I have to delete from a text all HTML tags but not the DIV one
> (keeping all the parameters in the tag).


Don't do that!

You should use a HTML parser module instead of regexps. Parsing HTML 
is not as trivial as it may seem.


You may like HTML::JFilter (based on HTML::Parser):

use HTML::JFilter;
$filter = new HTML::JFilter <<'*END*'
div: section style
*END*
$filteredHTML = $filter->doSTRING($enteredHTML);

# http://jenda.krynicky.cz/#HTML::JFilter

> I've done this:
> 
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> #!/usr/bin/perl use strict; my $test=<<EOS; <html><head><meta
> content="MSHTML 6.00.2800.1400" name="GENERATOR"> </head><body><font
> face="Courier New" size=2> =========SUPER SAVING========= <br>
> -product one <br> -product two <br><D> -product three <br><dIV
> section=true> ============================== <Br></DIV>
> <br><br></font></body> </html> EOS $test=~s/<br>/\n/ig;
> $test=~s/<^[DIV](.*?)>//ig; print $test;
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> with this I can hav ALMOST what I want. I delete all HTML tags but
> <DIV> one but I also keep a <D> tag and I delete the </DIV> tag that I
> would like to keep
> 
> The problem is in the ^[DIV] part of my regex....the "DIV" string is
> used as list of chars and not as whole world. Is there a way to
> archieve my goal?

Drop the []. [] means group of chars.

Also the ^ means something only at the beginning of a regexp or a 
group.
In this case you would have to use a positive look-ahead.

Read
        perldoc perlretut
        perldoc perlre

Jenda
===== [EMAIL PROTECTED] === http://Jenda.Krynicky.cz =====
When it comes to wine, women and song, wizards are allowed 
to get drunk and croon as much as they like.
        -- Terry Pratchett in Sourcery


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: deleting HTML tag...but not everyone

Reply via email to