Scott: A good place to start to read about "bot spam" and its impact on the data is this one: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/BotDetection We recently released a new classification for traffic. Besides classifying traffic as "user" or "spider" we also have now "automated" which tags as such traffic from a number of entities (but not all) that can be described as "high-volume spammers". You probably have some questions after reading the doc and for those we can set up a meeting.
Thanks, Nuria On Tue, Jun 16, 2020 at 9:55 AM Scott Bassett <[email protected]> wrote: > Hello Analytics Team- > > The Security Team has recently spent some cycles investigating improved > anti-automation (bad bots, high-volume spammers, etc.) solutions, > particularly around an improved Wikimedia captcha. We were curious if your > team has any methods or advice regarding the analysis of nefarious > automated traffic within the context of raw web requests or any other > relevant analytics data. If the answer is "not really", that's fine. But > if there are some relevant tools, methods, research, etc. your team has > performed that you would like to share with us, that would be much > appreciated. If it makes sense to discuss this further during a quick > call, I can try to find some time for a few of us over the next couple of > weeks. We also have an extremely barebones task where we are attempting to > document various methods of measurement which might be helpful: > https://phabricator.wikimedia.org/T255208. > > Thanks, > > -- > Scott Bassett > [email protected] > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
