[
http://jira.dspace.org/jira/browse/DS-364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stuart Lewis updated DS-364:
----------------------------
Attachment: _[DS-364]-for-review_patch_-_version2.patch
New version of the patch (StatisticsImporter only, ClassicDSpaceLogConverter is
not changed) based on very valuable feedback and live test data from Peter
Dietz. The new version has a few more features and has been tested against his
log file:
- Remove hits from googlebot, yahoo slurp, and msnbot
- Makes use of a reverse DNS LRU cache of 2500 entries, so a good proportion
of IP addresses are cached ant not resolved again
- Option to not perform DNS reverse lookups (search engine spiders are not
then removed) "-s" parameter
- Option to replace DSO IDs with values that exist locally for testing
purposes using log files from remote systems
- Relies on patch DS-445 to add new findAll() method to Bitstream.java
TODO: Add "-h" parameters to show help options
Typical output now shows:
/dspace/bin/dsrun org.dspace.statistics.util.StatisticsImporter -i
/Users/stuartlewis/Downloads/solroutput.log.2009-06-15 -l
Loading local communities... Found 2
Loading local collections... Found 3
Loading local items... Found 1369
Loading local bitstreams... Found 87
Processing file: /Users/stuartlewis/Downloads/solroutput.log.2009-06-15
Processed 28706 log lines
- 5158 entries added to solr: 17.968%
- 12 errors: 0.042%
- 23536 search engine activity skipped: 81.99%
About to commit data to solr... done!
Over 80% of hits are from the three major search engines and are removed.
> Script to convert legacy dspace.log stats into solr stats records
> -----------------------------------------------------------------
>
> Key: DS-364
> URL: http://jira.dspace.org/jira/browse/DS-364
> Project: DSpace 1.x
> Issue Type: Sub-task
> Reporter: Stuart Lewis
> Assignee: Stuart Lewis
> Fix For: 1.6.0
>
> Attachments: [DS-364]-for-review.patch,
> _[DS-364]-for-review_patch_-_version2.patch, solroutput.log.2009-06-15
>
>
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.dspace.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev
_______________________________________________
Dspace-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-devel