[
https://jira.duraspace.org/browse/DS-1030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Peter Dietz updated DS-1030:
----------------------------
Assignee: Peter Dietz
Status: Open (was: Received)
Hi Dan,
Thanks for reporting this issue. I actually just discovered a few SOLR bugs in
the past week, and have been working to make sure SOLR is behaving how one
might expect it to.
I've fixed this on my local instance, and I'll send the patch to this issue
soon.
What I've noticed that is happening wrong with markSpidersByIP is that it wants
to "update" the current record. However, what happens is that it adds another
record to SOLR that is isBot:true, and leaves whatever was previously there
alone. Creating a duplicate entry. You can't update a record in solr, but you
can add and delete records. So we add a record with isBot:true, and need to
delete the previous record which we find it by matching on type, ID, ip, and
date.
You can see the piece of code that I'm talking about here:
https://github.com/DSpace/DSpace/blob/a8fb9fa307dc0503d3b873380b417f9d1984963c/dspace-stats/src/main/java/org/dspace/statistics/SolrLogger.java#L473
P.S. Patch to be included soon.
> markRobotsByIP doesn't remove isBot:false records
> -------------------------------------------------
>
> Key: DS-1030
> URL: https://jira.duraspace.org/browse/DS-1030
> Project: DSpace
> Issue Type: Bug
> Components: Solr
> Affects Versions: 1.7.2
> Reporter: Dan Ishimitsu
> Assignee: Peter Dietz
>
> The expectation based on docs is that /dspace/bin/dspace stats-util -m would
> update isBot:false records to isBot:true (based on IPs in spider configs).
> It appears to instead create duplicate records with the isBot:true. So we end
> up with all of the original isBot:false records, plus an equal number of new
> isBot:true records. I think it's just missing a delete query at the end to
> clear the old records matching IPs with isBot:false.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://jira.duraspace.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
------------------------------------------------------------------------------
BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA
http://p.sf.net/sfu/rim-devcon-copy2
_______________________________________________
Dspace-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-devel