Hi Dale,
Here's a function I've used in a few projects and presentations (I
spoke about screen scraping most recently at WOTP) that returns a two
dimensional array of matches and sub expressions within matches:
<cffunction access="private" returntype="array" name="REScrape">
<cfargument type="string" name="regex" required="true">
<cfargument type="string" name="source" required="true">
<cfscript>
var resultIndex = 1;
var result = REFind(regex, source, 1, true);
var matches = arrayNew(1);
var terms = 0;
result = REFind(regex, source, 1, true);
while (result.pos[1] neq 0) {
terms = arrayNew(1);
for (resultIndex = 1; resultIndex le
arrayLen(result.pos);
resultIndex++)
arrayAppend(terms, mid(source,
result.pos[resultIndex],
result.len[resultIndex]));
arrayAppend(matches, terms);
result = REFind(regex, source, result.pos[1] +
result.len[1], true);
}
return matches;
</cfscript>
</cffunction>
Cheers,
Robin
ROBIN HILLIARD
Chief Executive Officer
[EMAIL PROTECTED]
RocketBoots Pty Ltd
Level 11
189 Kent Street
Sydney NSW 2001
Australia
Phone +61 2 9323 2507
Facsimile +61 2 9323 2501
Mobile +61 418 414 341
www.rocketboots.com.au
On 24/11/2008, at 4:59 PM, Dale Fraser wrote:
> I think I have it sorted, I didn’t realise that reFind only returned
> the first occurrence, doh!
>
> Regards
> Dale Fraser
> http://learncf.com
> http://flexcf.com
>
>
> From: [email protected] [mailto:[EMAIL PROTECTED]
> On Behalf Of Blair McKenzie
> Sent: Monday, 24 November 2008 4:51 PM
> To: [email protected]
> Subject: [cfaussie] Re: Pull apart a html table
>
> You could add an XML declaration and parse it into an XML object.
> You'll still have to find the table in the HTML though.
>
> Blair
> On Mon, Nov 24, 2008 at 4:45 PM, Steve Onnis
> <[EMAIL PROTECTED]> wrote:
>
> Would it be easier for you to convert it to a CSV format and process
> it from
> there?
>
>
>
> <cfscript>
>
> function TableToCSV () {
> var table = arguments[1];
>
> table = REReplaceNoCase(table, "[^[:print:]]", "",
> "ALL");
> table = replaceNocase(table, "</tr><tr>", chr(10),
> "ALL");
> table = replaceNoCase(table, "</td><td>", """,""",
> "ALL");
> table = replaceNoCase(table, "<td>", """", "ALL");
> table = replaceNoCase(table, "</td>", """", "ALL");
> table = REReplaceNoCase(table,
> "<(table|tbody|thead|tfoot|tr)([^>]*)>", "", "ALL");
> table = REReplaceNoCase(table,
> "</(table|tbody|thead|tfoot|tr)([^>]*)>", "", "ALL");
> return table;
> }
>
> </cfscript>
>
> <cfsavecontent variable="table">
> <table>
> <tr>
> <td>Cell 1.1</td>
> <td>Cell 1.2</td>
> <td>Cell 1.3</td>
> <td>Cell 1.4</td>
> </tr>
> <tr>
> <td>Cell 2.1</td>
> <td>Cell 2.2</td>
> <td>Cell 2.3</td>
> <td>Cell 2.4</td>
> </tr>
> </table>
> </cfsavecontent>
> <cfoutput>
> <pre>#HTMLEditFormat(TableToCSV(table))#</pre>
> </cfoutput>
>
>
>
> ________________________________
>
> From: [email protected] [mailto:[EMAIL PROTECTED]
> On Behalf
> Of Dale Fraser
> Sent: Monday, 24 November 2008 4:32 PM
> To: [email protected]
> Subject: [cfaussie] Re: Pull apart a html table
>
>
> I just need to get the content out, I know there is a fixed format
> to the
> tables, each row has three cells, and I need to extract the info
> from each
> cell and populate a database.
>
>
>
> I've been playing at regex to get all the rows to start with but
> having
> trouble, I have
>
>
>
> <cfset result = reFind("<tr[^>]*>(.*?)</tr>", html, 1, true) />
>
> <cfdump var="#result#" />
>
>
>
> But it only returns 2 elements in the array and there are hundreds
> of rows.
>
>
>
> Regards
>
> Dale Fraser
> http://learncf.com <http://learncf.com/>
>
> http://flexcf.com <http://flexcf.com/>
>
>
>
>
>
> From: [email protected] [mailto:[EMAIL PROTECTED]
> On Behalf
> Of Steve Onnis
> Sent: Monday, 24 November 2008 4:24 PM
> To: [email protected]
> Subject: [cfaussie] Re: Pull apart a html table
>
>
>
> what are you wanting to do with them?
>
>
>
> ________________________________
>
> From: [email protected] [mailto:[EMAIL PROTECTED]
> On Behalf
> Of Dale Fraser
> Sent: Monday, 24 November 2008 4:19 PM
> To: [email protected]
> Subject: [cfaussie] Pull apart a html table
>
> Is there an easy way to pull apart an html table.
>
>
>
> I have a heap of html where I need to loop through the html and get a
> specific table and then loop over the rows and columns.
>
>
>
> I could write all that code, but I feel like I would be reinventing
> the
> wheel, is this something that could be done with a regex or outside
> the
> scope?
>
>
>
> Regards
>
> Dale Fraser
> http://learncf.com <http://learncf.com/>
>
> http://flexcf.com <http://flexcf.com/>
>
>
>
>
>
> <BR
>
>
>
>
>
>
>
>
>
>
> >
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"cfaussie" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/cfaussie?hl=en
-~----------~----~----~----~------~----~------~--~---