OK - verity searches *best* on text files - and I gather you will be
creating a collection of text files - but you have to consider the "display"
of the content (i.e. to send the searcher to the correct page with the
correct formatting) - what we have done historically is this five stage
process:

1) create a verity collection in the server administrator
2) write a local "indexing" page that takes the collection and compiles it
with the correct url/ path functions. (in this case we usually have our
website road mad in mind - with section1.cfm text files in section1 folder,
section2.cfm text files in section2 folder, etc.)
3) create a search assist table with the following variables in:
        a) id - unique ref
        b) front - any front path you may need (e.g.
"section1/section1.cfm?var=1" would be held as "section1/" here)
        c) middle - the search/clipped variable - used for cross checking
with verity url results - (e.g. "section1/section1.cfm?var=1" would be
"section1" here
        d) end - any end variables or includes you may need for that page
(e.g. "section1/section1.cfm?var=1 would be ".cfm?var=1" here)
        e) title - title of page to show in result output....
4) write a search/results page which takes the search query and passes it
through the verity collection - to bring back the #url# variable - strips
the last "4" characters and everything before the last "/" which gives you a
clipped variable - then select content from the search assist table which
holds the clipped variable in the middle column - output this result (using
the full #front##middle##end# as the path)
5) write a cfschedule that re-indexes the collection at 2am (usual down time
for server)

This above methodology allows you to (for example) use JavaScript pop-up in
your output (e.g.
href="javascript:(displayPopUp('section1/section1.htm',400,600)" where front
var = "javascript:(displayPopUp('section1/", middle var = "section1" and end
var = ".htm',400,600)", etc. - it's by no means the fastest method - as you
have to query the verity collection AND then query the sql table (use
blockfactor=100 here though) - but it's the most comprehensive if you have a
complex structure to your website - as it allows for pre/and post variables
to be added to the result url path (which other searches won't allow).

We use a similar methodology on a collection/site of 40mb and the average
results page would come back within 1265 milliseconds.... (which is chuggy)

In answer to your other query - yes the AltaVista is a server based search.

HTH

J

-----Original Message-----
From: Michael Lugassy [mailto:[EMAIL PROTECTED]]
Sent: 14 May 2001 18:53
To: CF-Talk
Subject: Re: best search


"As a rule of thumb "use the right tool for the job", don't just use a tool
because it is there."

This is probarbly the best sentence I've heard this week! (come to think of
it, it's only monday.. :)
I asked for help and got it, but actually more questions arises, I'll try to
describe them:

1. the content would be txt and pure ASCII text only! actually I just
finished the mechanisem
which takes the html files from all of the sites I want to search in and
then remove
all html tags/javascript codes and stuff.

2. how much is considered BIG in the "lite" version of verity? 100mb, 200mb
of content?
that's about the amount I have.

3. verity would be great for me to display the data more efficient along
with CF tags. I know
CF, I know verity is easy to use, but I know they MUST be slower then
Microsoft Solutions.
(BTW: IS THERE ANY DIFFRENCE?, MS Search =SQL Full text?)

what will be the best way to go if:
I want the search to be as fast as possible YET:
1. I need an easy way to get the results to good-looking HTML format.
2. and maybe an easy way to still incorporate CF tags.

my database design is as follows:
uID = url ID
uLINK = full url location (i.e http://www.cftalk.com/whatis/cf.html)
uCONTENT = txt content I fetched from the site earlier?

do I actually need to save this data in tables or can I just search directly
on the txt files?

And final note: you mentioned Altavista engine - do they offer server search
engines too?

Thanks alot guys!! you'll be the first to run the search queries on my
engine :)

Michael.


----- Original Message -----
From: "James Maltby" <[EMAIL PROTECTED]>
To: "CF-Talk" <[EMAIL PROTECTED]>
Sent: Monday, May 14, 2001 6:15 PM
Subject: RE: best search


> It really depends on how your content is being displayed - if it's
flat-file
> then Verity is OK - if it's a combination of SQL and Verity then pass your
> query through both database and verity, if it's all sql held then rely on
> sql/verity collections (faster than plain sql requests), however database
> searches will depend on how often the data is to be updated.  A quick rule
> we use for template based sites is verity index the flat file content,
then
> build a "link" database with file names and page titles in which is used
to
> display the url links to the content (your option 2).
>
> However, saying the above the main downfall of Verity is that it takes a
> f*** load of time to index large amount of pages and the collections can
> become corrupt - so it may be worthwhile using something else completely
> different (AltaVista supply a free search engine as do Lycos and Atomz).
> What you have to remember is Verity with CF is a "lite" version and as
such
> is not too hot on larger sites.
>
> As a rule of thumb "use the right tool for the job", don't just use a tool
> because it is there.
>
> J
>
> "The Force is strong in this one..."
> - Darth Vader
>
>
>
>
> -----Original Message-----
> From: Michael Lugassy [mailto:[EMAIL PROTECTED]]
> Sent: 14 May 2001 17:45
> To: CF-Talk
> Subject: best search
>
>
> what will be the best option for searching on text based content:
>
> 1. insert content into database, search using SQL statments
> 2. insert content into database, search using VERITY queried collections
> 3. VERITY search, collections will search directly on 1000s of txt files
> 4. Microsoft TextSearch Engine
> 5. SQL Full Text Search
>
> Any help, please..
>
>
> -Michael.
>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Structure your ColdFusion code with Fusebox. Get the official book at 
http://www.fusionauthority.com/bkinfo.cfm

Archives: http://www.mail-archive.com/cf-talk@houseoffusion.com/
Unsubscribe: http://www.houseoffusion.com/index.cfm?sidebar=lists

Reply via email to