Re: Handling Bots and HTTP Clients

Reza Naghibi Tue, 09 Dec 2014 05:31:47 -0800

So let me explain some of the issues with this. Regardless, I would still like 
you to benchmark said patch and share the results. This will help drive the 
direction of future work on the clients.


1) Im almost certain isBot(ua) will perform worse than classify(ua), defeating 
the whole purpose of short circuiting classify. How do you plan on implementing 
isBot()? If that algorithm performs better than classify(), we might as well 
use it to match the entire DDR. No?

2) Under no circumstances should we implement DDR logic in code. The code 
should remain as a generic as possible. This means that its just a plain old 
ngram matcher. This kind of logic belongs in the DDR definition. Right now this 
allows for patterns and ranking. So maybe what you asking is that high ranking 
patterns be checked for first in a very quick way? Well, why are bots so high 
ranking? In normal traffic, bots make up a very small percentage. So wouldnt it 
make sense to check for Samsung and Apple products?

Once again, if possible, please benchmark some before and afters so we can get 
a better idea of what we are working with here. Eventhough im leaning towards 
saying this is a bad idea, I think it is a good exercise.


      From: Volkan YAZICI <[email protected]>
 To: "[email protected]" <[email protected]> 
 Sent: Tuesday, December 9, 2014 7:34 AM
 Subject: Handling Bots and HTTP Clients
   
Hello,

In the context of discussion "how do we handle HTTP clients", I would like
to vote for treating them as bots. Further, I want to propose adding a thin
layer above DeviceMapClient.classify() to make a shortcut for handling of
the bots as follows.

private final static Map<String, String> botAttributes =
Collections.singletonMap("is_bot", "true");

public Map<String, String> classify(String userAgent) {
    if (isBot(userAgent)) return botAttributes;
}

The motivation for this change is as follows:

  - Almost all of the attributes are making no sense for a bot and we are
  losing time to match it against the whole DDR.
  - Bot database will be able to evolve independently.
  - We can come up with a single compiled j.u.regex.Pattern to check bots.
  (I am pretty sure Reza knows a lot better performing approaches, but maybe
  for a future release.)

If the development team is ok with that, I want to implement this feature.

Best.

Re: Handling Bots and HTTP Clients

Reply via email to