An update on the way they do this. 'They' let the user download a signed applet to their machine and transfer an encrypted config file that contains rules etc for parsing out the good bits of the various sites they 'scrape'. This is all done with regular expressions so changing the HTML doesn't help a great deal.
When the end user runs a search using this applet, the applet fires the required form,url or cookie variables at our search engine and brings back the data returned. Obviously excluding all our branding etc. So, there is no real pattern to this and no real way I can see of effectively stopping it for any length of time. Any ideas? -----Original Message----- From: Simon Horwith [mailto:[EMAIL PROTECTED] Sent: 15 September 2003 22:14 To: [EMAIL PROTECTED] Subject: RE: [ cf-dev ] Stopping 'db scrapes' if you suspect the IP address changes and that the data on their site is updated as recently as you say, they must be making frequent requests. I'd stuff some code into Application.cfm that records the number of requests made from any given IP address. Each hour (or two or three) log the data, record it in a database, email it to yourself or whatever. Look for a pattern. When you have found the pattern, you may be able to use that data in a legal capacity. Whether or not you can use this data in a legal capacity, you certainly could use it to prevent requests from being delivered to undesirable recipients ;) ~Simon Simon Horwith CTO, Etrilogy Ltd. Member of Team Macromedia Macromedia Certified Instructor Certified Advanced ColdFusion MX Developer Certified Flash MX Developer CFDJList - List Administrator http://www.how2cf.com/ -----Original Message----- From: Snake Hollywood [mailto:[EMAIL PROTECTED] Sent: 15 September 2003 14:26 To: [EMAIL PROTECTED] Subject: RE: [ cf-dev ] Stopping 'db scrapes' You can also block their IP address in IIS Russ Michaels Macromedia Certified ColdFusion Professional -- Satachi Internet Development t: 0870 7873610 f: 07092 212636 tech support: 0906 960 7800 www.satachi.com Join our ColdFusion developer community list send email to: [EMAIL PROTECTED] ------------------------------------------------------------------------ - FIGHT BACK AGAINST SPAM! Download Spam Inspector, the Award Winning Anti-Spam Filter http://mail.giantcompany.com -----Original Message----- From: Rich Wild [mailto:[EMAIL PROTECTED] Sent: 15 September 2003 14:08 To: '[EMAIL PROTECTED]' Subject: RE: [ cf-dev ] Stopping 'db scrapes' > This hasn't stopped them so far, we even went as far as > 'randomising' the > names of the query parameters used in the search that gets > scraped for each > visit, and they cracked the formula. if your site has a valid copyright statement about the use of the data then probably the only thing you could do against such a stubborn pirate is to begin legal action. Even Peter's graphical option won't be much use if they hire someone to sit there and manually scrape the data, which they may well do if they're half as determined as they sound. Some legal action and bad publicity can do wonders to prevent this sort of thing, as we've discovered before. The flash thing *is* an option (but which again, won't stop someone manually writing out all the data). But do you a) want your customers to have to use flash as an interface and b) want to go through all the hassle of creating the flash interface with the looming dangers that the flash plugin might be facing at the moment in browsers. (oooh, controversial). > -----Original Message----- > From: Peter Dray [mailto:[EMAIL PROTECTED] > Sent: 15 September 2003 13:54 > To: [EMAIL PROTECTED] > Subject: RE: [ cf-dev ] Stopping 'db scrapes' > > > Rich: > > > - keep changing the design (or just HTML) so that they have > to change > their > >parsing code > > > This hasn't stopped them so far, we even went as far as > 'randomising' the > names of the query parameters used in the search that gets > scraped for each > visit, and they cracked the formula. > > Paul: > > >Going just on what you posted > >Would it be possible to do a check on CGI.HTTP_REFERER contains the > >current domain name ? So only pages in the site can call subpages and > >they cannot be called > direct ? > >Or are they just viewing a dynamic page ? that anyone can see ? > > > >After all .. file - save as ;) > > > > > What happens is the end user buys this bit of software from > the competitor > and this software sends a request to our sites search engine. > I assume the > software spoofs a user_agent and looks like any other user. > And yes, they > just request a dynamic pace and then parse out the good bits. > > I should add that we are not the only site that this software > 'scrapes', we > just seem to be the only ones who can be arsed to try and stop them. > > > -- > ** Archive: http://www.mail-archive.com/dev%40lists.cfdeveloper.co.uk/ > > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] For > human help, e-mail: [EMAIL PROTECTED] > -- ** Archive: http://www.mail-archive.com/dev%40lists.cfdeveloper.co.uk/ To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] For human help, e-mail: [EMAIL PROTECTED] -- ** Archive: http://www.mail-archive.com/dev%40lists.cfdeveloper.co.uk/ To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] For human help, e-mail: [EMAIL PROTECTED] -- ** Archive: http://www.mail-archive.com/dev%40lists.cfdeveloper.co.uk/ To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] For human help, e-mail: [EMAIL PROTECTED] -- ** Archive: http://www.mail-archive.com/dev%40lists.cfdeveloper.co.uk/ To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] For human help, e-mail: [EMAIL PROTECTED]
