[ 
https://jira.duraspace.org/browse/DS-1030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Dietz updated DS-1030:
----------------------------

    Assignee: Peter Dietz
      Status: Open  (was: Received)

Hi Dan,

Thanks for reporting this issue. I actually just discovered a few SOLR bugs in 
the past week, and have been working to make sure SOLR is behaving how one 
might expect it to. 

I've fixed this on my local instance, and I'll send the patch to this issue 
soon. 

What I've noticed that is happening wrong with markSpidersByIP is that it wants 
to "update" the current record. However, what happens is that it adds another 
record to SOLR that is isBot:true, and leaves whatever was previously there 
alone. Creating a duplicate entry. You can't update a record in solr, but you 
can add and delete records. So we add a record with isBot:true, and need to 
delete the previous record which we find it by matching on type, ID, ip, and 
date.

You can see the piece of code that I'm talking about here:
https://github.com/DSpace/DSpace/blob/a8fb9fa307dc0503d3b873380b417f9d1984963c/dspace-stats/src/main/java/org/dspace/statistics/SolrLogger.java#L473

P.S. Patch to be included soon.
                
> markRobotsByIP doesn't remove isBot:false records
> -------------------------------------------------
>
>                 Key: DS-1030
>                 URL: https://jira.duraspace.org/browse/DS-1030
>             Project: DSpace
>          Issue Type: Bug
>          Components: Solr
>    Affects Versions: 1.7.2
>            Reporter: Dan Ishimitsu
>            Assignee: Peter Dietz
>
> The expectation based on docs is that /dspace/bin/dspace stats-util -m would 
> update isBot:false records to isBot:true (based on IPs in spider configs).
> It appears to instead create duplicate records with the isBot:true. So we end 
> up with all of the original isBot:false records, plus an equal number of new 
> isBot:true records. I think it's just missing a delete query at the end to 
> clear the old records matching IPs with isBot:false.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://jira.duraspace.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

------------------------------------------------------------------------------
BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA
http://p.sf.net/sfu/rim-devcon-copy2
_______________________________________________
Dspace-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-devel

Reply via email to