Funny you should mention this -- we go this hit from Google the other day:

http://www.as.columbia.edu/test404response2057687042.html


Now since we have no page of that name, I'm guessing that google is dealing
with exactly the same problem that you are, and their process goes something
like this:

Fire off a bad request (complete with random number)
Store the response as your prototypical 404 page
When making other requests, compare the response to the prototypical 404
page. If it more or less matches it, don't add it to the database.

I thought it seemed a rather neat system (if in fact that was what they were
doing)

Michael Caulfield


-----Original Message-----
From: Dave Watts [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, May 09, 2001 10:30 AM
To: CF-Talk
Subject: RE: Spider 404s


> I'm building a little spider in CF and want to check each 
> page as it's retrieved to determine whether it contains the 
> expected contents.
> 
> Seems like looking for " 404 " in CFHTTP.FileContent won't 
> cover everything. Does anyone have a good working list of 
> words/phrases that can be used to verify links and ensure 
> they're still active?

Looking for "404" in CFHTTP.FileContent won't help you if the web server
returns an HTTP status code of 404. Instead, you might look at the variable
CFHTTP.StatusCode if you're running CF 4.5.x. Actually, this variable might
be present in earlier versions, but I don't know for sure.

Incidentally, if you request a non-existent CF page, you won't get back a
404 status code, even though the page itself will say "HTTP/1.0 404 Object
Not Found". The server will actually return a 200 status code, which
typically means "everything's OK". So, your spider may need to take this
into account.

Dave Watts, CTO, Fig Leaf Software
http://www.figleaf.com/
voice: (202) 797-5496
fax: (202) 797-5444
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Structure your ColdFusion code with Fusebox. Get the official book at 
http://www.fusionauthority.com/bkinfo.cfm

Archives: http://www.mail-archive.com/[email protected]/
Unsubscribe: http://www.houseoffusion.com/index.cfm?sidebar=lists

Reply via email to