Hi Willy, Gratitude. Thanks a lot. Appreciate this tons. Helps a lot.
Regards, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Long Wu Yuan 龙 武 缘 Sr. Linux Engineer 高级工程师 ChinaNetCloud 云络网络科技(上海)有限公司 | www.ChinaNetCloud.com1238 Xietu Lu, X2 Space 1-601, Shanghai, China | 中国上海市徐汇区斜土路1238号X2空 间1-601室 24x7 Support Hotline: +86-400-618-0024 | Office Tel: +86-(21)-6422-1946 We are hiring! http://careers.chinanetcloud.com | Customer Portal - https://customer-portal.service.chinanetcloud.com/ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ On Jan 31, 2015, at 9:32 PM, Willy Tarreau <[email protected]> wrote: > Hi guys, > > On Tue, Jan 27, 2015 at 06:01:13AM +0800, Yuan Long wrote: >> I am in the same fix. >> No matter what we try, the data to address is the real >> laptop/desktop/cellphone/server count. That count is skewed as soon as >> there are a hundred laptops/desktops behind a router. >> >> Best I heard is from Willy himself, suggestion to use base32+src. At the >> cost of losing plain text and having a binary to use in acl but works for >> now. Grateful to have HAProxy in the first place. > > There's no universal rule. Everything depends on how the site is made, > and how the bad guys are acting. For example, some sites may work very > well with a rate-limit on base32+src. That could be the case when you > want to prevent a client from mirroring a whole web site. But for sites > with very few urls, it could be another story. Conversely, some sites > will provide lots of different links to various objects. Think for > example about a merchant's site where each photo of object for sale is > a different URL. You wouldn't want to block users who simply click on > "next" and get 50 new photos each time. > > So the first thing to do is to define how the site is supposed to work. > Next, you define what is a bad behaviour, and how to distinguish between > intentional bad behaviour and accidental bad behaviour (eg: people who > have to hit reload several times because of a poor connection). For most > sites, you have to keep in mind that it's better to let some bad users > pass through than to block legitimate users. So you want to put the cursor > on the business side and not on the policy enforcement side. > > Proxies, firewalls etc make the problem worse, but not too much in general. > You'll easily see some addresses sending 3-10 times more requests than other > ones because they're proxying many users. But if you realize that a valid > user may also reach that level of traffic on regular use of the site, it's > a threshold you have to accept anyway. What would be unlikely however is > that surprizingly all users behind a proxy browse on steroids. So setting > blocking levels 10 times higher than the average pace you normally observe > might already give very good results. > > If your site is very special and needs to enforce strict rules against > sucking or spamming (eg: forums), then you may need to identify the client > and observe cookies. But then there's even less generic rule, it totally > depends on the application and the sequence to access the site. To be > transparent on this subject, we've been involved in helping a significant > number of sites under abuse or attack at HAProxy Technologies, and it > turns out that whatever new magic tricks you find for one site are often > irrelevant to the next one. Each time you have to go back to pencil and > paper and write down the complete browsing sequence and find a few subtle > elements there. > > Regards, > Willy >

