Unsubscribe!

On Thu, Jun 4, 2015 at 9:49 AM, Luis Lopez (JIRA) <[email protected]> wrote:

>
>     [
> https://issues.apache.org/jira/browse/NUTCH-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573114#comment-14573114
> ]
>
> Luis Lopez commented on NUTCH-2034:
> -----------------------------------
>
> Yes, we can use a general counter and say that or we could even be more
> specific and count by filter.
>
> > CrawlDB filtered documents counter.
> > -----------------------------------
> >
> >                 Key: NUTCH-2034
> >                 URL: https://issues.apache.org/jira/browse/NUTCH-2034
> >             Project: Nutch
> >          Issue Type: Improvement
> >          Components: crawldb
> >    Affects Versions: 1.10
> >            Reporter: Luis Lopez
> >            Priority: Minor
> >              Labels: counters, crawldb, filter, info, regex
> >             Fix For: 1.11
> >
> >
> > When we are doing big crawls we would like to know how many of the URLs
> are being discarded by the regex filters, this is only presented in the
> Inject class:
> > Injector: Total number of urls rejected by filters: 0
> > It will be nice to have a counter in the CrawlDB class so we know in
> every round how many were discarded by our filters:
> > CrawlDb update: Total number of URLs filtered by regex filters: 31415
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>

Reply via email to