James Edward Gray II wrote:
On May 17, 2004, at 11:16 PM, Andrew Gaffney wrote:

Roman Hanousek wrote:

Hi All I have bunch of files that contain code like this:
What I am trying to do is match <ps:img and this /> then check that this
piece of code contains a alt= tag.
    <ps:img page="/images/portal/arrow_down.gif" border="0"
                   width="9" height="6"
                   alt="${string['lists.list.sort.ascending.alt']}"
                   title="${string['lists.list.sort.ascending.alt']}" />
And if it doen't print the lines where it's missing to screen or file.


while($input =~ |<ps:img .+(alt\s*=\s*\".+\")?.+/>|sgc) {
  print "Missing ALT\n" if(! defined $1);
}

That doesn't give you line numbers, but it does give you an idea of where to start.


Be careful. Matching HTML-style markup with regexen is surprisingly tricky. I suspect the version above would not work well in many instances. Remember .+ is super greedy, more so since you allow it to swallow \n as well. The above pattern should match the first <ps:img, swallow the rest of ALL the data and then backup until it can find a />. That's probably not going to work out to well, in many cases.

Depending on how much is known about the tags, you might have more luck with a pattern like:

m!<ps:img([^>]+)/>!g

 From there it's pretty easy to check $1 for an alt="...", or whatever.

Hope that helps.

Doesn't the 'gc' modified make the whole think not as greedy? As a side effect of continuation, doesn't it try to match as many times as possible?


--
Andrew Gaffney
Network Administrator
Skyline Aeronautics, LLC.
636-357-1548


-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>




Reply via email to