Jon,
Here is some code I've used in the past to parse/spider URL's (using CFHTTP,
but you can use whatever means you want).
<!--- Specify a URL to spider content from. --->
<CFSET URL = "http://www.allaire.com/">
<!--- Use a TRY/CATCH block for HTTP connection failures. --->
<CFTRY>
<!--- All CFHTTP operations should use CFLOCK. --->
<CFLOCK
TYPE="EXCLUSIVE"
NAME="GetExternalURL"
TIMEOUT="10"
THROWONTIMEOUT="Yes">
<!--- Contact and retrieve the remote site's data. --->
<CFHTTP
METHOD="GET"
URL="#URL#"
RESOLVEURL="No"
TIMEOUT="10"
THROWONERROR="Yes">
</CFLOCK>
<!--- Catch connection failures. --->
<CFCATCH TYPE="COM.ALLAIRE.COLDFUSION.HTTPFAILURE">
<!--- An exception has occurred, so throw an error. --->
<CFTHROW
MESSAGE="The URL is not obtainable at this time.">
</CFCATCH>
</CFTRY>
<!--- Begin the script for displaying the parsed data. --->
<CFSCRIPT>
/* Set a Boolean flag for exiting our loop. */
Exit = false;
/* Starting position for our search. */
Start = 1;
/* Set a pointer to our CFHTTP.FileContent data. */
Page = CFHTTP.FileContent;
/* Build a table for output. */
WriteOutput("<TABLE><TR><TD>HREF</TD><TD>Text</TD></TR>");
/* Loop through our data in search of hyperlinks. */
while(NOT Exit) {
/* Match any occurence of a URL. */
Match = REFindNoCase(
"<A[[:print:]]+HREF[ ]?=[ ]?""?[ ]?([^"" ]+)[ ]?""?[[:print:]]*>([[:print:]]
+)</A>", Page, Start, TRUE);
/* If a URL is found. */
if (Match.pos[1]) {
/* Get the destination of the hyperlink. */
HREF = Mid(Page, Match.pos[2], Match.len[2]);
/* Get the text description of the hyperlink. */
Text = Mid(Page, Match.pos[3], Match.len[3]);
/* Output the results in a new table row. */
WriteOutput("<TR><TD>#HREF#</TD><TD>#Text#</TD></TR>");
/* Increment the starting position for the next match. */
Start = Match.pos[1] + Match.len[1];
/* If no more matches are found, exit the loop. */
} else Exit = true;
}
/* Finish the table by closing it. */
WriteOutput("</TABLE>");
</CFSCRIPT>
Dain Anderson
Caretaker, CF Comet
http://www.cfcomet.com/
----- Original Message -----
From: "Jon Hall" <[EMAIL PROTECTED]>
To: "CF-Talk" <[EMAIL PROTECTED]>
Sent: Saturday, June 02, 2001 6:16 PM
Subject: Re: REFindnocase - Parsing URL's
> Simple, just delete the the first line and change the name of your
variable
> to 'h'.
>
> This program only parses out the whole <a href ...> tag though. In order
to
> get just the actual url, I'd probably just stick all of the parsed href
tags
> in another array then parse for href=.
>
> I am actually going to extend this program to do this anyway. So I'll make
a
> follow up post with the modified source. I really just needed to do this
for
> a one off program, and it has kinda morphed into something a little more,
> simply since it's a challenge ;-)
> If you have access to irc, I will be idling in #coldfusion on efnet. /nick
> flux0
>
> jon
> ----- Original Message -----
> From: "W Luke" <[EMAIL PROTECTED]>
> To: "CF-Talk" <[EMAIL PROTECTED]>
> Sent: Saturday, June 02, 2001 5:22 PM
> Subject: Re: REFindnocase - Parsing URL's
>
>
> > Jon,
> >
> > How might I change this to searching inside a variable that contains the
> > text, and not a file as you have done?
> >
> > Will
> >
> >
> > --
> > Will
> > Free Advertising-=- www.localbounty.com
> > e: [EMAIL PROTECTED] icq: 31099745
> >
> >
> > ----- Original Message -----
> > From: "Jon Hall" <[EMAIL PROTECTED]>
> > Newsgroups: cf-talk
> > Sent: Saturday, June 02, 2001 9:57 PM
> > Subject: Re: REFindnocase - Parsing URL's
> >
> >
> > > Wow, now this is too much of a coincidence. I just opened up my email
> > > program to post a message saying I had just successfully written a
> program
> > > that parses url's out of a document, and was just wondering if anyone
> had
> > a
> > > better way to do it. Well here is how I did it.
> > >
> > > If anyone knows of a faster way I am definately interested. I imagine
> > > regular expressions would be much faster...
> > > Cfscripting this would most likely make it faster too, but for
> readability
> > I
> > > am leaving it in regular cfml for now.
> >
> >
> >
> >
>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Structure your ColdFusion code with Fusebox. Get the official book at
http://www.fusionauthority.com/bkinfo.cfm
Archives: http://www.mail-archive.com/cf-talk@houseoffusion.com/
Unsubscribe: http://www.houseoffusion.com/index.cfm?sidebar=lists