[ 
http://jira.dspace.org/jira/browse/DS-364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stuart Lewis updated DS-364:
----------------------------

    Attachment: _[DS-364]-for-review_patch_-_version2.patch

New version of the patch (StatisticsImporter only, ClassicDSpaceLogConverter is 
not changed) based on very valuable feedback and live test data from Peter 
Dietz. The new version has a few more features and has been tested against his 
log file:

 - Remove hits from googlebot, yahoo slurp, and msnbot
 - Makes use of a reverse DNS LRU cache of 2500 entries, so a good proportion 
of IP addresses are cached ant not resolved again
 - Option to not perform DNS reverse lookups (search engine spiders are not 
then removed) "-s" parameter
 - Option to replace DSO IDs with values that exist locally for testing 
purposes using log files from remote systems
 - Relies on patch DS-445 to add new findAll() method to Bitstream.java

TODO: Add "-h" parameters to show help options

Typical output now shows:

/dspace/bin/dsrun org.dspace.statistics.util.StatisticsImporter -i 
/Users/stuartlewis/Downloads/solroutput.log.2009-06-15 -l
Loading local communities... Found 2
Loading local collections... Found 3
Loading local items... Found 1369
Loading local bitstreams... Found 87
Processing file: /Users/stuartlewis/Downloads/solroutput.log.2009-06-15
Processed 28706 log lines
 - 5158 entries added to solr: 17.968%
 - 12 errors: 0.042%
 - 23536 search engine activity skipped: 81.99%
About to commit data to solr... done!

Over 80% of hits are from the three major search engines and are removed.

> Script to convert legacy dspace.log stats into solr stats records
> -----------------------------------------------------------------
>
>                 Key: DS-364
>                 URL: http://jira.dspace.org/jira/browse/DS-364
>             Project: DSpace 1.x
>          Issue Type: Sub-task
>            Reporter: Stuart Lewis
>            Assignee: Stuart Lewis
>             Fix For: 1.6.0
>
>         Attachments: [DS-364]-for-review.patch, 
> _[DS-364]-for-review_patch_-_version2.patch, solroutput.log.2009-06-15
>
>


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://jira.dspace.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
Dspace-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-devel

Reply via email to