On Nov 25, 2013, at 10:55 AM, Mike Blezien wrote:

> Hello,
>  
> Regular expression have never been my strong suite so hoping to get a litte 
> help with a line in file I need to extract a portion of it.
>  
> The text I need to extract from this line is "November 21, 2013" from this 
> line in the file, just the date:
>  
> Posted by <a href="mailto:someem...@email.com";>Some Name</a> on November 21, 
> 2013 at 23:21:58:<p>
>  
> what would be the regrex to use to extract the date from the line ?

The usual advice applies: don't use regular expressions to parse HTML. However, 
lots of people do it anyway, myself included. Your success at extracting usable 
data depends upon how rigid the format of the HTML is from page to page.

In your case, if the date always follows a link ('</a>') followed by 'on', and 
the date is always followed by 'at' and a time. you can use this:

  if( $line =~ m{ </a> \s+ on \s+ (\w+ \s \d{1,2} , \s \d{4}) \s at}x ) {
    print "The date is $1\n";
  }else{
    print "No match\n";
  }

Note I am using the extended regular expression syntax with the x modifier.


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to