[
http://jira.dspace.org/jira/browse/DS-440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=11019#action_11019
]
Mark Diggory commented on DS-440:
---------------------------------
[15:07] <stuartlewis> http://jira.dspace.org/jira/browse/DS-440 spiders.txt
empty
[15:08] <stuartlewis> Need input from mdiggory here
[15:08] <stuartlewis> My guess would be to ship a preconfigured list with 1.6,
and look at an update process for post 1.6
[15:08] <lcs> there's also a list of spider user-agent keywords in one of the
sitemaps to identify spiders..
[15:08] <mhwood> Anything we ship will be outdated. We need to document that in
big red letters.
[15:08] <lcs> would be good to merge that logic
[15:09] <tdonohue> +1 to shipping with some sort of list (or at least
documentation on how to format that spiders.txt file)
[15:09] <mhwood> +1 some list is better than no list
[15:09] <stuartlewis> Yes - 1.6.1 would need an update mechanism, a
preconfigured list should catch 90% until then
[15:09] <lcs> how about adding references to recommended websites to obtain
current lists of spider names?
[15:10] <stuartlewis> IIRC spiders.txt works on IP adresses. Could be upgraded
to include user-agent strings too.
[15:11] <tdonohue> So, should we leave assigned to mdiggory and come up with
some sort of list (even if it's just an example)?
[15:11] <stuartlewis> Yes - sounds sensible in the short timeframe we have
[15:11] <tdonohue> DS-440 Summary: Talk to mdiggory. Need to have some sort of
list and or recommendations on how to get a current list.
[15:11] <stuartlewis> (any spider filtering is better than the current
situation of no spider filtering)
[15:11] <stuartlewis> http://jira.dspace.org/jira/browse/DS-441 - resolved
[15:11] <richardrodgers> +1 provided we make clear it needs maintenance..
[15:11] <lcs> see dspace-xmlui/dspace-xmlui-webapp/src/main/webapp/sitemap.xmap
for some detection logic
> spiders.txt empty
> -----------------
>
> Key: DS-440
> URL: http://jira.dspace.org/jira/browse/DS-440
> Project: DSpace 1.x
> Issue Type: Bug
> Affects Versions: 1.6.0
> Reporter: Stuart Lewis
> Assignee: Mark Diggory
> Fix For: 1.6.0
>
>
> spiders.txt is currently empty, so search engine robots are not being
> excluded from solr stats.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.dspace.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev
_______________________________________________
Dspace-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-devel