> We've just had one of our sites 'stolen' by someone in the philipines > using an application called iCollect... I got tracert logs etc and will > be following up on this.... > > In the meantime, does anyone know of any IP / USER_AGENT filters that I > can just drop into an application file so that it protects sites against > this sort of thing.... > > If not, I'm thinking about writing something that will do this so that I > can protect all the sites that we host using CF and I'd like opinions on > how I would go about implementing it...
Unfortunately, I think your best bet is not to worry about it. The most important part of your site is the back-end functionality, which can't be stolen this way. There are two problems with any technical solutions that you might implement. The first problem is that these solutions may cause problems for legitimate users of your site. The second problem is that these solutions can be circumvented by determined adversaries. While you may be able to minimize the first problem, the second one is insurmountable. All you can do is make it a bit more difficult. What kind of problems might you cause legitimate users? First, you may end up denying them access to your site - they may be coming through the same proxy as the bad guy. This is pretty serious for an ecommerce site. Second, you may impede search engines, which may affect your page rank and as a result may affect the ability of people to find what they're looking for. What kinds of technical solutions might you try, and how can they be circumvented? I can think of three approaches - limiting requests from a single IP address during a period of time, limiting requests to internal files based on HTTP_REFERER (this is typically done to prevent links to images or other internal files from external sites), or obfuscating the contents of your pages to defeat wget-style spiders, which read through a page looking for links to find out which pages to request next. Again, all three of these approaches can be circumvented pretty easily. I've run into all three of these sorts of things before, and have been able to overcome them with a little bit of work. As an expert witness in civil cases in the state of Maryland, I have occasionally had to capture copies of sites in their entirety at a specific time, and have had to work around these issues. The last one is the most difficult, actually, since most off-the-shelf spidering tools don't understand JavaScript. Fortunately, you can program around this pretty easily, or pay a little extra for a retail tool like Texis' Webinator which can parse JavaScript to figure out the links within a document. Long story short, if it's on the public internet and isn't password-protected, you can't keep people from grabbing a copy. So what remedy do you have? Well, in the case of some guy in the Philippines, not much frankly. Having had some experience with clients in that part of the world, I think they'd have to do something really serious before you had any chance of stopping them. On the other hand, are you really concerned that his site will be confused with yours? If someone in the US tries this, you have a much better chance of forcing them to stop. Dave Watts, CTO, Fig Leaf Software http://www.figleaf.com/ phone: 202-797-5496 fax: 202-797-5444 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| Special thanks to the CF Community Suite Silver Sponsor - RUWebby http://www.ruwebby.com Message: http://www.houseoffusion.com/lists.cfm/link=i:4:186236 Archives: http://www.houseoffusion.com/cf_lists/threads.cfm/4 Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4 Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4 Donations & Support: http://www.houseoffusion.com/tiny.cfm/54

