Omega -1911 am Freitag, 1. Dezember 2006 06:05: > Hello all, > > I am trying to parse calendar events for a rss feed into variables. Can > someone help with building the following regex or point me in the direction > of some good examples? Thanks in advance. > > Here is what I have tried: (I don't know much about complex regex's as you > see) > $mystring =~ /.+(<p><li><b>)(\w+) (<FONT COLOR=\"\#990000\">)(\w+)(\[Ref > \#(\d+\])(.+)$/); > > > Here is a sample string: > <p><li><b> DATE <FONT COLOR="#990000">TITLE</FONT></b> EVENT <a href=" > http://www.mysite.com"target="_new">www.mysite.com</a> [Ref #67579]</li> > > What I would like to pull out is the TITLE && EVENT information. The sample > string is the format for each event. Any takers on this? Again, thanks for > any help.
If you *really* want do it with a regex, and not a parser (XML::LibXML, XML::Simple, etc.), here is one possibility. However, note that a regex is very fragile if it comes to format changes, or the input has unexpected chars in it. In the regex below, I try to be flexible concerning white space in the input; one could also be more specific in the part following the info to extract. There are generally two somehow contradicting aims: - be most specific to not match unwanted content - be liberal to handle format changes How did you develop the regex? It seems not to match as you liked. One way is to build it step by step; starting with matching strings between <p></p>, ckecking, be more specific, checking etc. Note that I escape the '#' in the regex because of the /x modifier that allows comments. BEWARE: Id did not spend hours. It just extracts what you want from the $input present. #!/usr/bin/perl use strict; use warnings; my $input=' <p><li><b> DATE <FONT COLOR="#990000">TITLE1</FONT></b> EVENT1 <a href="http://www.mysite.com"target="_new">www.mysite.com</a> [Ref #67579]</li></p> <p><li><b> DATE <FONT COLOR="#990000">TITLE2</FONT></b> EVENT2 <a href="http://www.mysite.com"target="_new">www.mysite.com</a> [Ref #67579]</li></p> '; my %info = $input =~ m; <p>\s* <li>\s* <b>.*? <font\s*color\s*=\s*"\#990000"[^>]*?>\s*(.*?)\s*</font>\s* </b>\s*(.*?)\s*<a.*?</a>\s*\[ref[^\]]+?\]\s* </li>\s* </p> ;mgxsi; print map { "<$_> => <$info{$_}>\n" } sort keys %info; __END__ Dani -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>