> We've just had one of our sites 'stolen' by someone in the philipines
> using an application called iCollect... I got tracert logs etc and will 
> be following up on this....
>
> In the meantime, does anyone know of any IP / USER_AGENT filters that I
> can just drop into an application file so that it protects sites against
> this sort of thing....
> 
> If not, I'm thinking about writing something that will do this so that I
> can protect all the sites that we host using CF and I'd like opinions on 
> how I would go about implementing it...

Unfortunately, I think your best bet is not to worry about it. The most
important part of your site is the back-end functionality, which can't be
stolen this way.

There are two problems with any technical solutions that you might
implement. The first problem is that these solutions may cause problems for
legitimate users of your site. The second problem is that these solutions
can be circumvented by determined adversaries. While you may be able to
minimize the first problem, the second one is insurmountable. All you can do
is make it a bit more difficult.

What kind of problems might you cause legitimate users? First, you may end
up denying them access to your site - they may be coming through the same
proxy as the bad guy. This is pretty serious for an ecommerce site. Second,
you may impede search engines, which may affect your page rank and as a
result may affect the ability of people to find what they're looking for.

What kinds of technical solutions might you try, and how can they be
circumvented? I can think of three approaches - limiting requests from a
single IP address during a period of time, limiting requests to internal
files based on HTTP_REFERER (this is typically done to prevent links to
images or other internal files from external sites), or obfuscating the
contents of your pages to defeat wget-style spiders, which read through a
page looking for links to find out which pages to request next. Again, all
three of these approaches can be circumvented pretty easily.

I've run into all three of these sorts of things before, and have been able
to overcome them with a little bit of work. As an expert witness in civil
cases in the state of Maryland, I have occasionally had to capture copies of
sites in their entirety at a specific time, and have had to work around
these issues. The last one is the most difficult, actually, since most
off-the-shelf spidering tools don't understand JavaScript. Fortunately, you
can program around this pretty easily, or pay a little extra for a retail
tool like Texis' Webinator which can parse JavaScript to figure out the
links within a document. Long story short, if it's on the public internet
and isn't password-protected, you can't keep people from grabbing a copy.

So what remedy do you have? Well, in the case of some guy in the
Philippines, not much frankly. Having had some experience with clients in
that part of the world, I think they'd have to do something really serious
before you had any chance of stopping them. On the other hand, are you
really concerned that his site will be confused with yours? If someone in
the US tries this, you have a much better chance of forcing them to stop.

Dave Watts, CTO, Fig Leaf Software
http://www.figleaf.com/
phone: 202-797-5496
fax: 202-797-5444


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Special thanks to the CF Community Suite Silver Sponsor - RUWebby
http://www.ruwebby.com

Message: http://www.houseoffusion.com/lists.cfm/link=i:4:186236
Archives: http://www.houseoffusion.com/cf_lists/threads.cfm/4
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4
Donations & Support: http://www.houseoffusion.com/tiny.cfm/54

Reply via email to