Unsubscribe! On Thu, Jun 4, 2015 at 9:49 AM, Luis Lopez (JIRA) <[email protected]> wrote:
> > [ > https://issues.apache.org/jira/browse/NUTCH-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573114#comment-14573114 > ] > > Luis Lopez commented on NUTCH-2034: > ----------------------------------- > > Yes, we can use a general counter and say that or we could even be more > specific and count by filter. > > > CrawlDB filtered documents counter. > > ----------------------------------- > > > > Key: NUTCH-2034 > > URL: https://issues.apache.org/jira/browse/NUTCH-2034 > > Project: Nutch > > Issue Type: Improvement > > Components: crawldb > > Affects Versions: 1.10 > > Reporter: Luis Lopez > > Priority: Minor > > Labels: counters, crawldb, filter, info, regex > > Fix For: 1.11 > > > > > > When we are doing big crawls we would like to know how many of the URLs > are being discarded by the regex filters, this is only presented in the > Inject class: > > Injector: Total number of urls rejected by filters: 0 > > It will be nice to have a counter in the CrawlDB class so we know in > every round how many were discarded by our filters: > > CrawlDb update: Total number of URLs filtered by regex filters: 31415 > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332) >

