--- Rasoul Hajikhani <[EMAIL PROTECTED]> wrote:
> Hi there,
> I am trying to match an expression that would perform different tasks
> depending on the returned value:
>
> #if (arguments begin with "<A HREF=")
> if ($args =~ /^\<A HREF=.*/i)
> {
> # do this
> }
> else
> {
> # do this
> }
>
> but it always fails to return any thing. Can some one tell me what am I
> doing wrong? Appreciate all the help...
> -r
Parding HTML with a regular expression is difficult and error-prone. I would strongly
recommend
against. The following snippet only works for a very small test case:
foreach my $args ( <DATA> ) {
if ($args =~ /^<\s*a\s*href\s*=/i) {
print "HREF: $args";
} else {
print "Not and HREF: $args";
}
}
__DATA__
<a href="test.cgi">
<a hREf = "something_else.htm">
<a name="bob">
<a href = '#bob'>
Knowing how your data gets into the system is at least as important as how your data
leaves the
system. Knowing your data source allows you to craft a better solution to the
problem. For
example, consider your regex:
/^\<A HREF=.*/i
What is the source of the data? Is it generated by another process or could humans
affect it?
There are several places where you can insert whitespace into that anchor tag, have
valid HTML,
and cause your regex to fail. Here's an example which will break code *and* mine:
<a
href=
"somefile.html"
>
That's annoying, but some of the documents I get have HTML formatted like that. Also,
you don't
need the dot star at the end. You don't use that information and forcing the regex
engine to
match it is wasteful.
I would recommend learning to use HTML::TokeParser or a similar module to parse HTML.
If you are
only extracting links, try HTML::LinkExtor.
Cheers,
Curtis "Ovid" Poe
=====
Senior Programmer
Onsite! Technology (http://www.onsitetech.com/)
"Ovid" on http://www.perlmonks.org/
__________________________________________________
Terrorist Attacks on U.S. - How can you help?
Donate cash, emergency relief information
http://dailynews.yahoo.com/fc/US/Emergency_Information/
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]