Dear Jose,

here are some relevant excerpts from the documentation at
http://www.dspace.org/1_6_2Documentation/

if it still leaves you with questions, please elaborate, so we can improve
the documentation.
5.2. The dspace.cfg Configuration Properties File5.2.49. DSpace SOLR
Statistics Configuration
Property:solr.log.server Example Value:solr.log.server =
${dspace.baseUrl}/solr/statisticsInformational Note:Is used by the
SolrLogger Client class to connect to the SOLR server over http and perform
updates and queries. Property:solr.spidersfileExample Value: solr.spidersfile
= ${dspace.dir}/config/spiders.txtInformational Note:Spiders file is
utilized by the SolrLogger, this will be populated by running the following
command:dsrun org.dspace.statistics.util.SpiderDetector -i <httpd log file>
Property:solr.dbfileExample Value:solr.dbfile =
${dspace.dir}/config/GeoLiteCity.datInformational Note:The following refers
to the GeoLiteCity database file utilized by the LocationUtils to calculate
the location of client requests based on IP address. During the Ant build
process (both fresh_install and update) this file will be downloaded from
http://www.maxmind.com/app/geolitecity if a new version has been published
or it is absent from your [dspace]/config directory.
Property:useProxiesExample
Value:useProxies = trueInformational Note:Will cause Statistics logging to
look for X-Forward URI to detect clients IP that have accessed it through a
Proxy service. Allows detection of client IP when accessing DSpace.
Property:statistics.item.authorization.admin Example
Value:statistics.item.authorization.admin
= trueInformational Note:Enables access control restriction on DSpace
Statistics pages, Restrictions are based on access rights to Community,
Collection and Item Pages. This will require the user to sign on to see that
statistics. Setting the statistics to "false" will make them publicly
available.

Chapter 8. DSpace System Documentation: System Administration
8.15. Client Statistics

*Table 8.15. Client Statistics Command Table*
Command used:*[dspace]*/bin/dspace stats-utilJava class:
org.dspace.statistics.util.StatisticsClientArguments (short and long forms):
Description -u or --update-spider-filesUpdate Spider IP Files from internet
into /dspace/config/spiders. Downloads Spider files identified in
dspace.cfgunder property
-f or --delete-spiders-by-flag Delete Spiders in Solr By isBot Flag. Will
prune out all records that have isBot:true-i or --delete-spiders-by-ipDelete
Spiders in Solr By IP Address. Will prune out all records that have IP's
that match spider IPs.-m or --mark-spidersUpdate isBog Flag in Solr. Marks
any records currently stored in statistics that have IP addresses matched in
spiders files-h or --helpCalls up this brief help table at CLI.

Notes:

The usage of these options is open for the user to choose, If they want to
keep spider entires in their repository, they can just mark them using "-m"
and they will be excluded from statistics queries when
"solr.statistics.query.filter.isBot
= true" in the dspace.cfg.

If they want to keep the spiders out of the solr repository, they can run
just use the "-i" option and they will be removed immediately.

There are guards in place to control what can be defined as an IP range for
a bot, in [dspace]/config/spiders, spider IP address ranges have to be at
least 3 subnet sections in length 123.123.123 and IP Ranges can only be on
the smallest subnet [123.123.123.0 - 123.123.123.255]. If not, loading that
row will cause exceptions in the dspace logs and exclude that IP entry.
kindest regards,

Bram Luyten

@mire - http://www.atmire.com

Technologielaan 9 - 3001 Heverlee - Belgium
533 2nd Street - Encinitas, CA 92024 - USA

http://www.togather.eu - Before getting together, get t...@ther


On Mon, Jul 19, 2010 at 9:52 PM, Mark H. Wood <[email protected]> wrote:

> On Mon, Jul 19, 2010 at 10:52:26AM -0400, Blanco, Jose wrote:
> > I was looking over the dspace stats code to see if it had anything to
> remove counts from crawlers and I don't see anything in there.  I just
> wanted to make sure that is the case.
>
> Would that be the Solr-based stat. code new in 1.6?  In 1.6.0 there is
> a file called config/spiders.txt to contain a list of crawler IP
> addresses.  This was changed in a later point release to use multiple
> files found in config/spiders.  There's also a list of update URLs for
> spider lists configured in dspace.cfg as solr.spiderips.urls.
>
> There isn't much documentation, though.  We need to correct that.
>
> --
> Mark H. Wood, Lead System Programmer   [email protected]
> Balance your desire for bells and whistles with the reality that only a
> little more than 2 percent of world population has broadband.
>        -- Ledford and Tyler, _Google Analytics 2.0_
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Sprint
> What will you do first with EVO, the first 4G phone?
> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
> _______________________________________________
> DSpace-tech mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dspace-tech
>
>
------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to