Solr Statistics markRobotsByIP can mark too many IP addresses, including IP's 
not on the IP list
------------------------------------------------------------------------------------------------

                 Key: DS-1008
                 URL: https://jira.duraspace.org/browse/DS-1008
             Project: DSpace
          Issue Type: Bug
          Components: Solr
    Affects Versions: 1.7.2, 1.7.1, 1.7.0, 1.6.2, 1.6.1, 1.6.0
            Reporter: Peter Dietz


The function markRobotsByIP is including too many bot IP's by a factor of 
potentially 9.

https://github.com/DSpace/DSpace/blob/5366d237afa07005ec485831c9bca1f1c992f01d/dspace-stats/src/main/java/org/dspace/statistics/SolrLogger.java#L473
/* query for ip, exclude results previously set as bots. */
processor.execute("ip:"+ip+ "* AND -isBot:true");

ip* would expand:
10.10.10* to 10.10.[10, 100-109].*
10.10.10.10* to 10.10.10.[10, 100-109]


My co-worker Brian Stamper suggested:
if (ip.matches("[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+") {
   // Full 4 octet string, run as-is.
        processor.execute("ip:" + ip + " AND -isBot:true");
} else if (ip.matches("\.$") {
   // didn't match full-octet, but ends in period, we assume it was something 
like #.#.#. or #.#. -- I don't expect this in the "stock" input from ip-list.com
        processor.execute("ip:" + ip + "* AND -isBot:true");
} else if (ip.matches("[0-9]$") {
  // ends with a number, and is not a full 4-octet as first entry, so we append 
.*
        processor.execute("ip:" + ip + ".* AND -isBot:true");
} else {
        log.error("Unexpected IP value: " + ip);
}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://jira.duraspace.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

------------------------------------------------------------------------------
EMC VNX: the world's simplest storage, starting under $10K
The only unified storage solution that offers unified management 
Up to 160% more powerful than alternatives and 25% more efficient. 
Guaranteed. http://p.sf.net/sfu/emc-vnx-dev2dev
_______________________________________________
Dspace-devel mailing list
Dspace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-devel

Reply via email to