Be careful about setting all tokens to 1 if your Web site contains sensitive information and if you are putting this info on a URL string. What will happen is that every user that comes through via a search engine will be considered the same user according to ColdFusion, which is rather bad. If Bob comes in with a token=1, logs in as himself, then Mary comes in with a token=1, the server will think it is Bob and that he is already logged in, so Mary will have access to everything that Bob has access to. If Bob has filled in a shopping cart, then Mary will see Bob's shopping cart. I would think major search engines are smart enough to strip away the user tokens from URLs, but if a search engine isn't smart enough then you run into problems.
The best solution to the problem probably depends on the particulars of the site and what you are using session variables for. Often sites can get away with not having sessions on the public part of the Web site and setting them only after the user logs in, which a search engine would never do. The answer to your specific question depends on being able to accurately identify bots, which is a challenge by itself. Michael Dinowitz has expertise in this area. Maybe you can stick with the five largest searching engines? Maybe a commercial product like BrowserHawk would help? If you can identify the bot, then you can append a variable that contains the token like mypage.cfm#optional_token# where optional token might be an empty string or it might be a session identifier. Good luck, Mike Chabot On Wed, Feb 3, 2010 at 10:39 AM, Bob Hendren <[email protected]> wrote: > > I'm at an odd crossroads here. Up until now, I've kept my application > off-limits to search engines. I've used a couple of techniques found on Ben > Nadel's blog for giving them short sessions and such. Been working well. > > With respect to human users, I've been VERY diligent about using > URLSessionFormat to keep session variables across page requests with cookies > disabled. Also been working well. > > So here's my quandary - I now want open up my application to allow search > engines to index. However, I've got session variables embedded everywhere in > my URLs due to URLSessionFormat()! So what's going to occur is this: the > robots will grab all of these URLs, index them, then pass them as hijacked > sessions through their results and I won't be able to track new visitors! > > I just ran across a recent mention by Michael Dinowitz of a technique for > setting CFID and CFToken to 1 whenever it's a bot, mentioned here: > > http://www.anujgakhar.com/2010/01/26/what-is-the-best-way-to-deal-with-spidersbotscrawlers/ > > Bottom line: how can I make the URLs NOT pass the session management > variables when it's a search engine? > > > ---------------------------- > Bob Hendren > President/CEO > ListingWare, Inc. > http://www.listingware.com > 800-867-4707 > [email protected] > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| Want to reach the ColdFusion community with something they want? Let them know on the House of Fusion mailing lists Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:330390 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4

