So let me explain some of the issues with this. Regardless, I would still like
you to benchmark said patch and share the results. This will help drive the
direction of future work on the clients.
1) Im almost certain isBot(ua) will perform worse than classify(ua), defeating
the whole purpose of short circuiting classify. How do you plan on implementing
isBot()? If that algorithm performs better than classify(), we might as well
use it to match the entire DDR. No?
2) Under no circumstances should we implement DDR logic in code. The code
should remain as a generic as possible. This means that its just a plain old
ngram matcher. This kind of logic belongs in the DDR definition. Right now this
allows for patterns and ranking. So maybe what you asking is that high ranking
patterns be checked for first in a very quick way? Well, why are bots so high
ranking? In normal traffic, bots make up a very small percentage. So wouldnt it
make sense to check for Samsung and Apple products?
Once again, if possible, please benchmark some before and afters so we can get
a better idea of what we are working with here. Eventhough im leaning towards
saying this is a bad idea, I think it is a good exercise.
From: Volkan YAZICI <[email protected]>
To: "[email protected]" <[email protected]>
Sent: Tuesday, December 9, 2014 7:34 AM
Subject: Handling Bots and HTTP Clients
Hello,
In the context of discussion "how do we handle HTTP clients", I would like
to vote for treating them as bots. Further, I want to propose adding a thin
layer above DeviceMapClient.classify() to make a shortcut for handling of
the bots as follows.
private final static Map<String, String> botAttributes =
Collections.singletonMap("is_bot", "true");
public Map<String, String> classify(String userAgent) {
if (isBot(userAgent)) return botAttributes;
}
The motivation for this change is as follows:
- Almost all of the attributes are making no sense for a bot and we are
losing time to match it against the whole DDR.
- Bot database will be able to evolve independently.
- We can come up with a single compiled j.u.regex.Pattern to check bots.
(I am pretty sure Reza knows a lot better performing approaches, but maybe
for a future release.)
If the development team is ok with that, I want to implement this feature.
Best.