Jon,

Here is some code I've used in the past to parse/spider URL's (using CFHTTP,
but you can use whatever means you want).

<!--- Specify a URL to spider content from. --->
<CFSET URL = "http://www.allaire.com/";>

<!--- Use a TRY/CATCH block for HTTP connection failures. --->
<CFTRY>

<!--- All CFHTTP operations should use CFLOCK. --->
    <CFLOCK
        TYPE="EXCLUSIVE"
        NAME="GetExternalURL"
        TIMEOUT="10"
        THROWONTIMEOUT="Yes">

<!--- Contact and retrieve the remote site's data. --->
      <CFHTTP
          METHOD="GET"
          URL="#URL#"
          RESOLVEURL="No"
          TIMEOUT="10"
          THROWONERROR="Yes">
    </CFLOCK>

<!--- Catch connection failures. --->
    <CFCATCH TYPE="COM.ALLAIRE.COLDFUSION.HTTPFAILURE">
<!--- An exception has occurred, so throw an error. --->
      <CFTHROW
          MESSAGE="The URL is not obtainable at this time.">
    </CFCATCH>
  </CFTRY>

<!--- Begin the script for displaying the parsed data. --->
<CFSCRIPT>
/* Set a Boolean flag for exiting our loop. */
Exit = false;

/* Starting position for our search. */
Start = 1;

/* Set a pointer to our CFHTTP.FileContent data. */
Page = CFHTTP.FileContent;

/* Build a table for output. */
WriteOutput("<TABLE><TR><TD>HREF</TD><TD>Text</TD></TR>");

/* Loop through our data in search of hyperlinks. */
while(NOT Exit) {

/* Match any occurence of a URL. */
  Match = REFindNoCase(






"<A[[:print:]]+HREF[ ]?=[ ]?""?[ ]?([^"" ]+)[ ]?""?[[:print:]]*>([[:print:]]
+)</A>", Page, Start, TRUE);

/* If a URL is found. */
  if (Match.pos[1]) {

/* Get the destination of the hyperlink. */
      HREF = Mid(Page, Match.pos[2], Match.len[2]);

/* Get the text description of the hyperlink. */
      Text = Mid(Page, Match.pos[3], Match.len[3]);

/* Output the results in a new table row. */
      WriteOutput("<TR><TD>#HREF#</TD><TD>#Text#</TD></TR>");

/* Increment the starting position for the next match. */
      Start = Match.pos[1] + Match.len[1];

/* If no more matches are found, exit the loop. */
   } else Exit = true;
}

/* Finish the table by closing it. */
WriteOutput("</TABLE>");
</CFSCRIPT>

Dain Anderson
Caretaker, CF Comet
http://www.cfcomet.com/


----- Original Message -----
From: "Jon Hall" <[EMAIL PROTECTED]>
To: "CF-Talk" <[EMAIL PROTECTED]>
Sent: Saturday, June 02, 2001 6:16 PM
Subject: Re: REFindnocase - Parsing URL's


> Simple, just delete the the first line and change the name of your
variable
> to 'h'.
>
> This program only parses out the whole <a href ...> tag though. In order
to
> get just the actual url, I'd probably just stick all of the parsed href
tags
> in another array then parse for href=.
>
> I am actually going to extend this program to do this anyway. So I'll make
a
> follow up post with the modified source. I really just needed to do this
for
> a one off program, and it has kinda morphed into something a little more,
> simply since it's a challenge ;-)
> If you have access to irc, I will be idling in #coldfusion on efnet. /nick
> flux0
>
> jon
> ----- Original Message -----
> From: "W Luke" <[EMAIL PROTECTED]>
> To: "CF-Talk" <[EMAIL PROTECTED]>
> Sent: Saturday, June 02, 2001 5:22 PM
> Subject: Re: REFindnocase - Parsing URL's
>
>
> > Jon,
> >
> > How might I change this to searching inside a variable that contains the
> > text, and not a file as you have done?
> >
> > Will
> >
> >
> > --
> > Will
> > Free Advertising-=- www.localbounty.com
> > e: [EMAIL PROTECTED]  icq: 31099745
> >
> >
> > ----- Original Message -----
> > From: "Jon Hall" <[EMAIL PROTECTED]>
> > Newsgroups: cf-talk
> > Sent: Saturday, June 02, 2001 9:57 PM
> > Subject: Re: REFindnocase - Parsing URL's
> >
> >
> > > Wow, now this is too much of a coincidence. I just opened up my email
> > > program to post a message saying I had just successfully written a
> program
> > > that parses url's out of a document, and was just wondering if anyone
> had
> > a
> > > better way to do it. Well here is how I did it.
> > >
> > > If anyone knows of a faster way I am definately interested. I imagine
> > > regular expressions would be much faster...
> > > Cfscripting this would most likely make it faster too, but for
> readability
> > I
> > > am leaving it in regular cfml for now.
> >
> >
> >
> >
>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Structure your ColdFusion code with Fusebox. Get the official book at 
http://www.fusionauthority.com/bkinfo.cfm

Archives: http://www.mail-archive.com/cf-talk@houseoffusion.com/
Unsubscribe: http://www.houseoffusion.com/index.cfm?sidebar=lists

Reply via email to