[ http://issues.apache.org/jira/browse/NUTCH-5?page=history ] Doug Cutting closed NUTCH-5: ----------------------------
> Hit limiter off-by-one bug > -------------------------- > > Key: NUTCH-5 > URL: http://issues.apache.org/jira/browse/NUTCH-5 > Project: Nutch > Type: Bug > Components: searcher > Reporter: Andy Liu > Priority: Minor > Attachments: fix-hitlimiting.patch > > When re-searching for more raw hits, the first result of the next site is > skipped. > From NutchBean.java > *snip* > // get the next raw hit > if (rawHitNum >= hits.getLength()) { > // optimize query by prohibiting more matches on some > excluded sites > Query optQuery = (Query) query.clone(); > for (int i = 0; i < excludedSites.size(); i++) { > if (i == MAX_PROHIBITED_TERMS) { > break; > } > optQuery.addProhibitedTerm(((String) > excludedSites.get(i)), > IndexSearcher.HIT_LIMIT_FIELD); > } > numHitsRaw = (int) (numHitsRaw * RAW_HITS_FACTOR); > LOG.info("re-searching for " + numHitsRaw + > " raw hits, query: " + optQuery); > hits = searcher.search(optQuery, numHitsRaw); > LOG.info("found " + hits.getTotal() + " raw hits"); > rawHitNum = 0; > continue; > } > *snip* > rawHitNum is reset to 0, but the for loop increments it by one and skips the > next result. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - If you want more information on JIRA, or have a bug to report see: http://www.atlassian.com/software/jira
