Re: Handling Bots and HTTP Clients

Volkan YAZICI Tue, 09 Dec 2014 09:00:51 -0800

The model I proposed will not buy us a significant performance gain, which
was also not my major motivation. (That being said, I also second the idea
of implementing a benchmark.) Instead, I wanted to address the issue of
separating the concerns of handling bots and regular devices.


Maybe I better should rephrase my starting point: How can we add new bot
and HTTP client footprints to the existing DDR?

On Tue Dec 09 2014 at 2:31:24 PM Reza Naghibi
<[email protected]> wrote:

> So let me explain some of the issues with this. Regardless, I would still
> like you to benchmark said patch and share the results. This will help
> drive the direction of future work on the clients.
>
> 1) Im almost certain isBot(ua) will perform worse than classify(ua),
> defeating the whole purpose of short circuiting classify. How do you plan
> on implementing isBot()? If that algorithm performs better than classify(),
> we might as well use it to match the entire DDR. No?
>
> 2) Under no circumstances should we implement DDR logic in code. The code
> should remain as a generic as possible. This means that its just a plain
> old ngram matcher. This kind of logic belongs in the DDR definition. Right
> now this allows for patterns and ranking. So maybe what you asking is that
> high ranking patterns be checked for first in a very quick way? Well, why
> are bots so high ranking? In normal traffic, bots make up a very small
> percentage. So wouldnt it make sense to check for Samsung and Apple
> products?
>
> Once again, if possible, please benchmark some before and afters so we can
> get a better idea of what we are working with here. Eventhough im leaning
> towards saying this is a bad idea, I think it is a good exercise.
>
>
>       From: Volkan YAZICI <[email protected]>
>  To: "[email protected]" <devicemap-dev@incubator.
> apache.org>
>  Sent: Tuesday, December 9, 2014 7:34 AM
>  Subject: Handling Bots and HTTP Clients
>
> Hello,
>
> In the context of discussion "how do we handle HTTP clients", I would like
> to vote for treating them as bots. Further, I want to propose adding a thin
> layer above DeviceMapClient.classify() to make a shortcut for handling of
> the bots as follows.
>
> private final static Map<String, String> botAttributes =
> Collections.singletonMap("is_bot", "true");
>
> public Map<String, String> classify(String userAgent) {
>     if (isBot(userAgent)) return botAttributes;
> }
>
> The motivation for this change is as follows:
>
>   - Almost all of the attributes are making no sense for a bot and we are
>   losing time to match it against the whole DDR.
>   - Bot database will be able to evolve independently.
>   - We can come up with a single compiled j.u.regex.Pattern to check bots.
>   (I am pretty sure Reza knows a lot better performing approaches, but
> maybe
>   for a future release.)
>
> If the development team is ok with that, I want to implement this feature.
>
> Best.
>
>
>

Re: Handling Bots and HTTP Clients

Reply via email to