Be careful about setting all tokens to 1 if your Web site contains
sensitive information and if you are putting this info on a URL
string. What will happen is that every user that comes through via a
search engine will be considered the same user according to
ColdFusion, which is rather bad. If Bob comes in with a token=1, logs
in as himself, then Mary comes in with a token=1, the server will
think it is Bob and that he is already logged in, so Mary will have
access to everything that Bob has access to. If Bob has filled in a
shopping cart, then Mary will see Bob's shopping cart. I would think
major search engines are smart enough to strip away the user tokens
from URLs, but if a search engine isn't smart enough then you run into
problems.

The best solution to the problem probably depends on the particulars
of the site and what you are using session variables for. Often sites
can get away with not having sessions on the public part of the Web
site and setting them only after the user logs in, which a search
engine would never do.

The answer to your specific question depends on being able to
accurately identify bots, which is a challenge by itself. Michael
Dinowitz has expertise in this area. Maybe you can stick with the five
largest searching engines? Maybe a commercial product like BrowserHawk
would help? If you can identify the bot, then you can append a
variable that contains the token like mypage.cfm#optional_token# where
optional token might be an empty string or it might be a session
identifier.

Good luck,
Mike Chabot

On Wed, Feb 3, 2010 at 10:39 AM, Bob Hendren <[email protected]> wrote:
>
> I'm at an odd crossroads here. Up until now, I've kept my application 
> off-limits to search engines. I've used a couple of techniques found on Ben 
> Nadel's blog for giving them short sessions and such. Been working well.
>
> With respect to human users, I've been VERY diligent about using 
> URLSessionFormat to keep session variables across page requests with cookies 
> disabled. Also been working well.
>
> So here's my quandary - I now want open up my application to allow search 
> engines to index. However, I've got session variables embedded everywhere in 
> my URLs due to URLSessionFormat()! So what's going to occur is this: the 
> robots will grab all of these URLs, index them, then pass them as hijacked 
> sessions through their results and I won't be able to track new visitors!
>
> I just ran across a recent mention by Michael Dinowitz of a technique for 
> setting CFID and CFToken to 1 whenever it's a bot, mentioned here:
>
> http://www.anujgakhar.com/2010/01/26/what-is-the-best-way-to-deal-with-spidersbotscrawlers/
>
> Bottom line: how can I make the URLs NOT pass the session management 
> variables when it's a search engine?
>
>
> ----------------------------
> Bob Hendren
> President/CEO
> ListingWare, Inc.
> http://www.listingware.com
> 800-867-4707
> [email protected]
>
> 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Want to reach the ColdFusion community with something they want? Let them know 
on the House of Fusion mailing lists
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:330390
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4

Reply via email to