In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] (Douglass Franklin) writes: >I'm trying to transform this html table to a colon-delimited flat-file >database. This is what I have so far: > >HTML: ><tr><td class='bodyblack' width='50%'><a >href='http://jsearch.usajobs.opm.gov/summary.asp?OPMControl=IC9516' >class='jobrlist'><font size='2'>ACCOUNTANT ></font></a></td><td class='bodyblack' width='40%'>$24,701.00 > - $51,971.00 ></td><td class='bodyblack'>INDEFINITE</td></tr> ><tr><td class='bodyblack'>CONTINENTAL U.S., US</td> ></tr><td class='bodyblack' colspan='3'>  </td></tr> > >Database Record (wanted): >Accountant:$24,701.00 - $51,971.00:INDEFINITE:CONTINENTAL U.S., US > >Regex I have: >$jobrecord =~ ^(<tr>)(<td class='bodyblack' width='50%'>)(.+)(  ></td></tr>)$ > >However, this doesn't seem to be working. Please help.
What you pasted isn't Perl code; there are so regex delimiters. I assume you had them and then go on to use $1 etc. Let's take a look at the input. What you want is the non-white space content between > and <, ignoring the final element. So: my @fields = $jobrecord =~ />\s*([^<]*[^<\s])/g; pop @fields; $fields[0] = ucfirst lc $fields[0]; print join(":", map { tr/\n/ /; $_ } @fields), "\n"; We ignore leading white space by skipping past \s*. Then we get zero or more things that aren't <, followed by a character that's not < or white space, thereby getting at least one character and avoiding trailing white space. Then we pop off the , adjust the case of the first element, and print everything out separated by :, turning newlines into spaces along the way. -- Peter Scott http://www.perldebugged.com -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]