On May 17, 2004, at 11:16 PM, Andrew Gaffney wrote:
Roman Hanousek wrote:
Hi All I have bunch of files that contain code like this: What I am trying to do is match <ps:img and this /> then check that this piece of code contains a alt= tag. <ps:img page="/images/portal/arrow_down.gif" border="0" width="9" height="6" alt="${string['lists.list.sort.ascending.alt']}" title="${string['lists.list.sort.ascending.alt']}" /> And if it doen't print the lines where it's missing to screen or file.
while($input =~ |<ps:img .+(alt\s*=\s*\".+\")?.+/>|sgc) { print "Missing ALT\n" if(! defined $1); }
That doesn't give you line numbers, but it does give you an idea of where to start.
Be careful. Matching HTML-style markup with regexen is surprisingly tricky. I suspect the version above would not work well in many instances. Remember .+ is super greedy, more so since you allow it to swallow \n as well. The above pattern should match the first <ps:img, swallow the rest of ALL the data and then backup until it can find a />. That's probably not going to work out to well, in many cases.
Depending on how much is known about the tags, you might have more luck with a pattern like:
m!<ps:img([^>]+)/>!g
From there it's pretty easy to check $1 for an alt="...", or whatever.
Hope that helps.
Doesn't the 'gc' modified make the whole think not as greedy? As a side effect of continuation, doesn't it try to match as many times as possible?
-- Andrew Gaffney Network Administrator Skyline Aeronautics, LLC. 636-357-1548
-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>