[
https://jira.duraspace.org/browse/DS-790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark H. Wood updated DS-790:
----------------------------
Status: Accepted (was: Code Review Needed)
See recent comments on PR#231 by Peter Dietz. Historical data won't be
enclosed in an HttpRequest and so require a different method signature taking
request attribute strings.
> SOLR - Spider detection to match on hostname or useragent
> ---------------------------------------------------------
>
> Key: DS-790
> URL: https://jira.duraspace.org/browse/DS-790
> Project: DSpace
> Issue Type: Improvement
> Components: Solr
> Affects Versions: 1.6.0, 1.6.1, 1.6.2, 1.7.0
> Environment: solr
> Reporter: Peter Dietz
> Assignee: Mark H. Wood
> Labels: has-pull-request
> Original Estimate: 0 minutes
> Remaining Estimate: 0 minutes
>
> Spiders are currently detected by matching their IP address to one listed in
> the /dspace/config/spiders/ip-list-X.txt, however as spiders change IP
> addresses, or the ip-list is unmaintained, then many spiders can slip
> through, however they will usually keep their user agent or hostname intact.
> I've noticed a sore point in my solr data, where msnbot is completely
> unfiltered by solr. They have an additional ip list:
> http://www.iplists.com/nw/msn.txt however it is very old, and with additional
> bingbots on the horizon, it would be easier to detect, and filter them out of
> the logs by user-agent, then to maintain all of the IP address ranges. The
> code to do this in SOLR is unimplemented, and this ticket is a place holder
> to encourage this work to filter out based on user agent / dns-hostname to be
> finished.
> To see all of the hits from msnbot that are unfiltered, look at:
> http://localhost:8080/solr/statistics/select?q=dns:msnbot*&facet=true&facet.field=dns&facet.mincount=1&facet.limit=5000
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Dspace-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-devel