Title: Message Title
|
|
Fresh Information: Nov 1st 2013 IRUS-UK position statement on the treatment of robots and unusual usage http://www.irus.mimas.ac.uk/news/IRUS-UK_position_statement_robots_and_unusual_usage_v1_0_Nov_2013.pdf July 2013 IRUS download data – identifying unusual usage http://www.irus.mimas.ac.uk/news/IRUS_download_data_Final_report.pdf cherry picking a few interesting ideas: - Automatically remove "overactive" IP addresses if they generate more than X downloads from the repository in a day - Craft more intelligent threshold rules that take into account usage per day and per month for a particular IP and the user agents it uses: Day metrics $day_hits=($hit_count/$rfr_ip_count)/$agent_count; # total hits per repository per user agent during day $day_hit_level=($dist_hit_count/$rfr_ip_count)/$agent_count; # distinct hit per repository per user agent Month metrics $month_hits=($sum_hits/$num)/$agent_count_range; total hits per repository per user agent during month $month_hit_level=($max_dist/$num)/$agent_count_range# distinct hit per repository per user agent Rules if ( ($month_hits > 100) and $month_hit_level > 40) {$flag='Check 1'} # lot of use on day and in earlier month elsif ( ($day_hits > 10) and $month_hit_level > 10) {$flag='Check 2'} # a day’s use may be more significant than a month’s elsif($month_hits > 100 and ($sum_dist_hits/$sum_hits < .2)) {$flag='Check 3'} # high proportion of hits with the same id} else { if (($day_hits > 10 ) and $day_hit_level > 20) {$flag='Check 4'} # medium number of unique hits with same id elsif($day_hits > 10) and ($day_hit_level/$day_hits < .2)) {$flag='Check 5'} # low number of hits with same id}}
|
|
|
|
|
The current implementation of bot traffic filtering relies on IP lists. Even though using hostnames (as suggested here: https://jira.duraspace.org/browse/DS-790 ) could improve the situation, there are still forms of abusive traffic we might want to detect and exclude from stats. The most obvious example here would be repeated hits or downloads coming ...
|
|
|
|
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Dspace-devel mailing list
Dspace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-devel