[
http://jira.dspace.org/jira/browse/DS-440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stuart Lewis updated DS-440:
----------------------------
Attachment: [DS-440]_spiders_txt_is_empty.patch.txt
Attached is a patch that downloads lists of spider IP addresses from
iplists.com. The actual list of URLs to download from is configurable in
dspace.cfg
At present, it downloads over 100,000 IP addresses. Some of these are
'computed' where a class C network is specified, with the script adding the .0
.. .255 addresses in automatically.
I'm not sure this is the most efficient way of holding a set of IPs to ignore -
it is a very long list!
> spiders.txt empty
> -----------------
>
> Key: DS-440
> URL: http://jira.dspace.org/jira/browse/DS-440
> Project: DSpace 1.x
> Issue Type: Bug
> Affects Versions: 1.6.0
> Reporter: Stuart Lewis
> Assignee: Mark Diggory
> Fix For: 1.6.0
>
> Attachments: [DS-440]_spiders_txt_is_empty.patch.txt
>
>
> spiders.txt is currently empty, so search engine robots are not being
> excluded from solr stats.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.dspace.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
_______________________________________________
Dspace-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-devel