Jillian,

Have you tried using the Verity K2 Spider on your site?  It's a full web
spider that will follow all the links on your site and index the content
regardless of it being static or dynamic.

Jeff

----- Original Message -----
From: "Jillian Carroll" <[EMAIL PROTECTED]>
To: "CF-Talk" <[EMAIL PROTECTED]>
Sent: Thursday, March 13, 2003 9:01 AM
Subject: Search Indexing


Good Morning!

I know I'm close here... but I'm having a few problems trying to get my
search engine to index my static pages.  The code below works... but I
need to evolve it to do two more things, and I can't find anything
through google/macromedia forums.

My questions: How can I evolve my code so it will not only index the
directory specified, but it will index all of the subdirectories of that
directory?  AND how can I index more than one directory?  Is there a way
to do that without just duplicating my entire loop?

My code is pasted below.  Thank you!!

--
Jillian

*** *** ***

<!---Form Proccessing--->
<cfif isdefined("Form.Action")>
<cfswitch expression="#Form.Action#">
<cfcase value="Update">
<cfquery name="delete_existing"
datasource="#DSN#">
DELETE from indexedpages
</cfquery>

<!--- This variable will hold all the
files found --->
<cfset FileList = "">

<!--- Note that filter is optional for
"mixed" sites --->
<!--- This loop collects all the files
into one list --->
<cfloop list="#Form.DirList#" index=Dir>

<CFDIRECTORY ACTION="list"
DIRECTORY="#Dir#"
NAME="IndexList">

<cfoutput query="IndexList">
<cfif IndexList.type is
"file">
<cfset FileList
= listappend(FileList,Dir&IndexList.name)>
</cfif>
</cfoutput>
</cfloop>

<!---This loop reads/parses and inserts
each file--->
<cfloop list="#FileList#" index="File">
<cffile action="read"
file="#File#" variable="ParseMe">
<cfset Title="Untitled">

<!---Fetch Title--->
<cfif
REFindNoCase("<h[0-9]",ParseMe,1) is not 0>
<cfset
start=find(">",ParseMe,REFindNoCase("<h[0-9]",ParseMe,1)) + 1>
<cfset
end=find("<",ParseMe,start)>
<cfset length = end -
start>
<cfset
title=mid(ParseMe,start,length)>
</cfif>

<!---Remove Common tags--->
<cfset ParseMe =
REReplaceNoCase(ParseMe,"<[^>]*>","","all")>

<!---Remove Noise Words--->
<cfset NoiseWords =

"a,an,and,at,as,are,all,be,but,by,can,do,for,get,got,here,I,if,it,is,in,
like,may,

not,our,or,of,on,that,then,the,they,there,to,which,we,you,your">

<cfloop list="#NoiseWords#"
index="Noise">
<cfset ParseMe =
REReplaceNoCase(ParseMe,"[[:space:]]#Noise#[[:space:]]"," ","all")>
</cfloop>

<!---Remove Extra Space--->
<cfloop from="1" to="10"
index="loop">
<cfset ParseMe =
REReplaceNoCase(ParseMe,"[[:space:]]+[[:space:]]"," ","all")>
</cfloop>

<!---Resolve Web Root--->
<cfset WebPath=
ReplaceNoCase(File,Form.SitePathRoot,Form.SiteWebRoot,"all")>
<cfset WebPath=
ReplaceNoCase(WebPath,"\","/","all")>

<!---Insert Into Dbase--->
<cftry>
<cfquery
name="insert_pages" datasource='#DSN#'>
INSERT into indexedpages

(
webpath,
filepath,
title,
contents
)
VALUES (
'#WebPath#',
'#File#',
'#Title#',
'#ParseMe#'
)
</cfquery>

<cfcatch
type='Database'>
<cfoutput>
Database
error for:
<br
/>file:<b>#Title#</b>
<br
/>file:<b>#File#</b>
<br
/>file:<b>#WebPath#</b>
<br />
</cfoutput>
</cfcatch>
</cftry>
<!---End of File loop--->
</cfloop>

<!---Update the collection--->
<cfquery datasource="#DSN#"
name="getContents">
SELECT id,
webpath,
filepath,
title,
contents
FROM indexedpages
</cfquery>

<cftry>
<cfindex
action="update"
collection="epipages"
query="getContents"
key="ID"
title="title"
type="Custom"
body="Contents"
custom1="WebPath">

<cfcatch type="any">
<br />Sorry, an error occurred
while trying to update your collection. Check that the collection
exists.
</cfcatch>

</cftry>
<!---End of Update Case--->
<br />Collection has been updated successfully.
</cfcase>
</cfswitch>
</cfif>

<!--- Action Form --->
<cfoutput>
<form action="#ThisFileName#" method="post">
<table border="0" cellpadding="4" cellspacing="0">

<tr>
<td>List of Directories to Index:</td>
<td><input type="text" size=50
name="DirList" value="/data/aliases/epi/pdfshadow/"></td>
</tr>
<tr>
<td>
File Path to Site Root:
<br />(Trailing slashes are required)
</td>
<td valign="top"><input type="text" size=50
name="SitePathRoot" value="/data/aliases/epi/pdfshadow/"></td>
</tr>
<tr>
<td>
URL Path to Site Root:
<br />(ie http://www.yoursite.com/)

<br />(Trailing slashes are required)
</td>
<td valign="top"><input type="text" size=50
name="SiteWebRoot" value="http://epidev.lights.com/";></td>
</tr>
<tr>
<td colspan="2"><input type="submit"
name="action" value="Update"></td>
</tr>
</table>
</form>
</cfoutput>



~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Archives: http://www.houseoffusion.com/cf_lists/index.cfm?forumid=4
Subscription: 
http://www.houseoffusion.com/cf_lists/index.cfm?method=subscribe&forumid=4
FAQ: http://www.thenetprofits.co.uk/coldfusion/faq
Structure your ColdFusion code with Fusebox. Get the official book at 
http://www.fusionauthority.com/bkinfo.cfm

                                Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
                                

Reply via email to